|
|
Subscribe / Log in / New account

LinuxConf.eu: Documentation and user-space API design

By Jonathan Corbet
September 3, 2007
[Michael Kerrisk] Michael Kerrisk, the Linux man page maintainer since 2004, gave a talk on the value of documentation during the first day of LinuxConf Europe 2007. While documents are useful for end users trying to get their job done, this use was not Michael's focus; instead, he talked about how documentation can help in the creation of a better kernel in the first place. The writing of documents, he says, reveals bugs and bad interface designs before they become part of a released kernel. And that can help to prevent a great deal of pain for both kernel and user-space developers.

Michael presented three examples to show how the process of writing documentation can turn up bugs:

  • The inotify interface was added to the 2.6.13 kernel as an improved way for an application to request notifications when changes are made to directories and files. Around 2.6.16, Michael got around to writing a manual page for this call, only to find that one option (IN_ONESHOT) had never worked. Once the problem was found it was quickly fixed, but that did not happen until an effort was made to document the interface.

  • splice() was added in 2.6.17. Michael found that it was easy to write programs which would go into an unkillable hang; clogging the system with hung processes was also easy. Again, once the problem was found, it was fixed quickly.

  • The timerfd() interface, as merged in 2.6.22, did not work properly. It also has some design issues, as were covered in this article.

The existence of buggy interfaces in stable kernel releases is, says Michael, a result of insufficient testing of -rc kernels during the development process. Better documentation can help with this problem. Better documentation can also help with the API design process in the first place. Designing good APIs is hard, and is made harder by the fact that, for the kernel, API design mistakes must be maintained forever. So anything which can help in the creation of a good API can only be a good thing.

The characteristics of a good API include simplicity, ease of use, generality, consistency with other interfaces, and integration with other interfaces. Bad designs, instead, lack those characteristics. As an example, Michael discussed the dnotify interface - the previous attempt to provide a file-change notification service. Dnotify suffered as a result of its use of signals, which never leads to an easy-to-use interface. It was only able to monitor directories, not individual files. It required keeping an open file descriptor, thus preventing the unmounting of any filesystem where dnotify was in use. And the amount of information provided to applications was limited.

Another example was made of the mlock() and remap_file_pages() system calls. Both have start and length arguments to specify the range of memory to be affected. The mlock() interface rounds the length argument up to the next page, while remap_file_pages() rounds it down. The two system calls also differ in when they apply the length argument. As a result, a call like:

    mlock (4000, 6000);

will affect bytes 0..12287, while

    remap_file_pages (4000, 6000, ...);

affects bytes 0..4095. This sort of inconsistency makes these system calls harder for developers to use.

Many bits can be expended on how bad these interfaces are. But, asks Michael, was it all really the developer's fault? Or did the lack of a review process contribute to these problems?

Many of these difficulties result from the fact that the designers of system call interfaces (kernel hackers) are not generally the users of those interfaces. To make things better, Michael put forward a proposal to formalize the system call interface development process. He acknowledges that this sort of formalization is a hard sell, but the need to create excellent interfaces from the first release makes it necessary. So he would like to see a formal signoff requirement for APIs - though who would be signing off on them was not specified. There would need to be a design review, full documentation of the interface, and a test suite before this signoff could happen. The test suite would need to be at least partially written by people other than the developer, who will never be able to imagine all of the crazy things users might try to do with a new interface.

The documentation requirement is an important part of the process. Writing documentation for an interface will often reveal bugs or bad design decisions. Beyond that, good documentation makes the nature of the interface easier for others to understand, resulting in more review and more testing of a proposed interface. Without testing from application developers, problems in new APIs will often not be found until after they have been made part of a stable kernel release, and that is too late.

In the question period, it was asserted that getting application developers to try out system calls in -rc kernels is always going to be hard. An alternative idea, which has been heard before, would be to mark new system calls as "experimental" for a small number of kernel release cycles after they are first added. Then it would be possible to try out new system calls without having to run development kernels and still have a chance to influence the final form of the new API. It might be easier to get the kernel developers to agree to this kind of policy than to get them to agree to an elaborate formal review process, but it still represents a policy change which would have to be discussed. That discussion could happen soon; how it goes will depend on just how many developers really feel that there is a problem with how user-space APIs are designed and deployed now.

[Arnd Bergmann] The next day, Arnd Bergmann gave a talk on how not to design kernel interfaces. Good interfaces, he says, are designed with "taste," but deciding what has taste is not always easy. Taste is subjective and changes over time. But some characteristics of a tasteful interface are clear: simplicity, consistency, and using the right tool for the job. These are, of course, very similar to the themes raised by Michael the day before.

As is often the case, discussion of interface design is often most easily done by pointing out the things one should not do. Arnd started in with system calls, which are the primary interface to the kernel. Adding new system calls is a hard thing to do; there is a lot of review which must be gotten through first (though, as discussed above, perhaps it's still not hard enough). But often the alternative to adding system calls can be worse; he raised the hypothetical idea of a /dev/exit device; a process which has completed its work could quit by opening and writing to that device. Such a scheme would allow the elimination of the exit() system call, but it would not be a more tasteful interface by any means.

The ioctl() system call has long been the target of criticism; it is not type safe, hard to script, and is an easy way to sneak in ABI changes without anybody noticing. On the other hand, it is well established, easy to extend, it works in modules, and it can be a good way to prototype system calls. Again, trying to avoid ioctl() can lead to worse things; Arnd presented an example from the InfiniBand code which interprets data written to a special file descriptor to execute commands. The result is essentially ioctl(), but even less clear.

Sockets are a well-established interface which, Arnd says, would never be accepted into the kernel now. They are totally inconsistent with everything else, operate on devices which are not part of the device tree, have read and write calls which are not read() and write(), and so on. Netlink, by adding complexity to the socket interface, did not really help the user-space interface situation in general; its use is, he says, best avoided. But, importantly, it is better to use netlink than to reinvent it. The wireless extensions API was brought up as another example of how not to do things; putting wireless extensions over netlink turned out to be a way of combining the worst features of sockets and ioctl() into a single interface.

The "fashionable" way to design new interfaces now is with virtual filesystems. But troubles can be found there as well. /proc became a sort of dumping ground for new interfaces until the developers began to frown on additions there. Sysfs was meant to solve many of the problems with /proc, but it clearly has not solved the API stability problem. Virtual filesystems may well be the best way to create new interfaces, but there are many traps there.

Finally, there was some talk of designing interfaces to make ABI emulation easy. Arnd suggests that data structures should be the same in both kernel and user space. Avoid long variables, and, whenever possible, avoid pointers as well. Structure padding - either explicit or caused by badly aligned fields - can lead to trouble. And so on.

All told, it was a lively session with a great deal of audience participation. There are many user-space interface design mistakes which are part of Linux and must be supported forever. There is also a great deal of interest in avoiding making more of those mistakes in the future. The problem remains a hard one, though, even with the benefit of a great deal of experience.

Index entries for this article
ConferenceLinuxConf.eu/2007


to post comments

LinuxConf.eu: Documentation and user-space API design

Posted Sep 3, 2007 20:16 UTC (Mon) by oak (guest, #2786) [Link]

"could quite by opening" -> "quit"

material for "user-space API design"

Posted Sep 3, 2007 20:26 UTC (Mon) by arnd (subscriber, #8866) [Link] (3 responses)

I have uploaded my presentation and the paper for it to userweb.kernel.org, for people who are interested in reading all of it.
The slides probably make sense only for people that have attended my talk.

material for "documentation and APIs" presentation

Posted Sep 4, 2007 8:56 UTC (Tue) by mkerrisk (subscriber, #1978) [Link] (2 responses)

And I've uploaded a copy of the slides for my talk What we lose without words.

material for "documentation and APIs" presentation

Posted Sep 4, 2007 10:30 UTC (Tue) by nijhof (subscriber, #4034) [Link] (1 responses)

File permission seem to be wrong: I can see the directory, but for the pdf:

Forbidden

You don't have permission to access /~mtk/papers/lce2007/What_we_lose_without_words.pdf on this server.

material for "documentation and APIs" presentation

Posted Sep 4, 2007 11:12 UTC (Tue) by mkerrisk (subscriber, #1978) [Link]

> File permission seem to be wrong:

Thanks -- fixed now.

Taste

Posted Sep 4, 2007 1:01 UTC (Tue) by ncm (guest, #165) [Link] (28 responses)

Examples of how contentious interface design can be is found in the infamous series of "Worse is Better" articles by Richard Gabriel. Of course some of us find it comical how he and his colleagues torture themselves over what are really quite easy questions (particularly when the obvious but intolerable answer is, simply, "not Lisp"). We have seen it proven again and again that interfaces that try to implement an abstraction the underlying mechanism isn't really up to supporting are supplanted by a less ambitious abstraction that matches what the underlying mechanism really does. Is this "worse"? Or are some people just more in love with their own ideas than with correctly functioning software?

Taste

Posted Sep 5, 2007 11:40 UTC (Wed) by ruoccolwn (guest, #2270) [Link] (27 responses)

> "Of course some of us find it comical how he and his colleagues torture
> themselves over what are really quite easy questions (particularly when
> the obvious but intolerable answer is, simply, "not Lisp")."

Can you elaborate your point ? Gabriel and Lisp-ers in general are quite knowledgeable developers, as far as I can understand.

sergio

Taste

Posted Sep 5, 2007 18:30 UTC (Wed) by ncm (guest, #165) [Link] (26 responses)

Lisp has many qualities that cause people who learn it early in life to love it forever. It also has an extraordinary amount of baggage which, collectively, makes it unsuitable for most serious industrial work. Those who associate Lisp with their own youth, however intelligent and knowledgeable they may be, tend to be unable to distinguish the lovable qualities from the baggage.

Garbage collection seems to be among the worst of the baggage, for what must be subtle reasons, because these same knowledgeable and intelligent people seem unable to perceive them. Unfortunately GC sabotages many more modern languages as well, not least Haskell. It's not as clear what other features of Lisp make it unsuitable for industrial use. Gabriel himself says that pathologically slow algorithms, when written in Lisp, are the most natural and esthetically pleasing. Its dynamic type binding, which makes it (like most scripting languages) fun to use for coding small programs, becomes an increasingly debilitating liability for big programs.

The poster-boy application of Lisp in industry, ITA Software's QPX airline-fare search engine used by Orbitz.com, makes a good example. ITA goes to great lengths never to trigger a GC cycle, because the first one would take longer than restarting the program. Therefore, they call out to a huge C++ library to do floating-point calculations, XML parsing, database lookups, or anything that needs dynamic memory allocation. They use explicit integer type declarations throughout, and use peculiar idioms known to compile to optimal machine code for inner loops. Bugs often arise from integer overflows because they don't have a 64-bit integer type.

I've gone on at rather some length, off-topic, because you asked and I don't know how to answer any more concisely.

Taste

Posted Sep 6, 2007 9:36 UTC (Thu) by lysse (guest, #3190) [Link] (18 responses)

That oh-so-tiresome allegiance to garbage collection probably has something to do with forty years' worth of studies finding that GC is both faster and safer than manual memory management (indeed, as you back-handedly acknowledge, if your program quits before a GC cycle is needed GC is even cheaper than stack allocation). Moreover, many concurrent algorithms are known which eliminate long GC pauses in exchange for a memory float (and exhibit far better performance than reference counting), so unless ITA's Lisp is using a 1960-vintage mark'n'sweep collector it's not clear why they would have to care about GC cycles; and given Lisp's customisability, nor is it clear why they haven't replaced the collector with something a bit more modern.

Most people, when they argue against GC, take two tacks - "it's slow" (which has by now been thoroughly discredited; there are well known ways to make GC fast, and incremental and concurrent collectors completely abolish GC pauses, making GC suitable for realtime applications) and "it wastes memory", which has more truth to it, but only because there's a fairly obvious tradeoff between time spent doing GC and memory overhead (if you have enough memory for everything you'll ever allocate, GC is free; work down from there). If your arguments are more substantial, I'd love to hear them; but "if it ain't C it ain't fast enough" is a meme that really needs to be put out of its misery at the earliest opportunity, at least until developer time becomes as cheap as computer time again.

Not least because when it really mattered, C wasn't even fast enough.

Taste

Posted Sep 6, 2007 10:20 UTC (Thu) by IkeTo (subscriber, #2122) [Link] (9 responses)

> "it's slow" (which has by now been thoroughly discredited; there are well
> known ways to make GC fast, and incremental and concurrent collectors
> completely abolish GC pauses, making GC suitable for realtime applications)

I think one problem of many GC systems is that they make everything an object that requires the GC to handle. GC is perhaps faster than manual object creation and destruction, but it is definitely slower than no object to create/destruct at all. You can say a piece of Java code that generates an Int and throw it away 2G times is faster than a piece of C++ code that new an int and delete it 2G times. But that's not the point if the only way to pass a modifiable parameter to a method is to create such an Int (or actually worse, create an object containing a public int field and pass that object around), while a C++ programmer will happily just use an int& as a parameter, making sure that there is no exchange of objects in the trade.

Not that I think performance should always be such a big sell. For me, I like the Python system better: it does GC and thus saves the programmers the hassle to manually deal with them; it uses reference counting most of the time so the GC cost is mostly spread across all accesses, and it uses full GC is those corner cases involving containers so that a cycle won't eat your memory. So it more or less combines the best of the world, except for the GC overhead which I care very little anyway.

Taste

Posted Sep 6, 2007 11:20 UTC (Thu) by lysse (guest, #3190) [Link] (8 responses)

As far as I can tell, the subjects of "creating objects for everything" and "passing certain parameters by reference" are completely independent of each other. For instance, Oberon and Modula-3 mandate GC, yet also allow integers to be passed by reference without creating new objects to hold them. Java is far from the last word in either GC design or call-by semantics...

Taste

Posted Sep 6, 2007 12:51 UTC (Thu) by IkeTo (subscriber, #2122) [Link] (7 responses)

You focus too much on the Java side, and missed the point intended: on the C++ side, no object "creation" or "destruction" cost is needed for passing arguments by reference. The integer being passed is simply created on the stack, allocation cost shared with other variables (by just subtracting a larger number on entry of the function) or deallocation cost (it simply restore the value of the base pointer from a fixed location on the stack). What I mean is that in traditional language you can do many actions without allocating/deallocating an object. Yes garbage collection might be "fast", but they cannot beat "no cost". People doing high performance computing should know this, even though most people really should not bother too much with performance.

Taste

Posted Sep 6, 2007 15:42 UTC (Thu) by lysse (guest, #3190) [Link] (5 responses)

> You focus too much on the Java side

...which is presumably why I didn't reference two other languages that allow precisely what you're complaining about garbage collected languages not allowing - oh, wait...

> and missed the point intended: on the C++ side, no object "creation" or "destruction" cost is needed for passing arguments by reference.

*sigh* What I said went completely over your head, didn't it...?

Again, that's EXACTLY the point I caught and responded to. Allocating objects on the stack and passing parameters by reference are, contrary to your apparent belief, neither innovations in C++, nor rendered impossible in a garbage collected language; again I cite Oberon, which is just fine with both and yet fully GC'd.

And the issue of whether every first-class, dynamically-allocated object a language deals with must be allocated on the heap is a different one again; lots of optimisations, of varying degrees of complexity, are known that reduce the heap burden substantially. (Indeed, the MLkit compiler statically tracks object lifetimes and allocates up to eight different stacks in heapspace, giving compiled ML programs the speed of stack allocation with the correctness of garbage collection.) But even when all objects must be heap-allocated, it's not necessarily the end of the world in performance terms; Appel (1987) shows that garbage collection can still end up faster than stack-based allocation.

Taste

Posted Sep 6, 2007 16:00 UTC (Thu) by IkeTo (subscriber, #2122) [Link] (4 responses)

> Allocating objects on the stack and passing parameters by reference are,
> contrary to your apparent belief, neither innovations in C++, nor rendered
> impossible in a garbage collected language; again I cite Oberon, which is
> just fine with both and yet fully GC'd.

I never say they are never "rendered impossible" (Even C++ does that!), and the "apparent belief" seems very speculative (e.g., Even assembly does stack based allocation!). Let me remind the beginning of my original post.

> "I think one problem of *many* GC systems is that..."
(emphasis added here)

What I mean is that many "short-comings" that others talk about GC are not intrinsic to the availability of GC, but instead they are due to particular languages which have made certain choices, like which of the allocations they choose to tax the GC system. Again, most people should not care at all.

> Appel (1987) shows that garbage collection can still end up faster than
> stack-based allocation.

I'm interested in this work. Is it available on-line, or if not, can you give the name of the journal/conference where it appear in?

Taste

Posted Sep 6, 2007 16:16 UTC (Thu) by lysse (guest, #3190) [Link] (3 responses)

> What I mean is that many "short-comings" that others talk about GC are not intrinsic to the availability of GC, but instead they are due to particular languages which have made certain choices, like which of the allocations they choose to tax the GC system. Again, most people should not care at all.

In that case, then I thoroughly misunderstood you - I thought you were making exactly this mistake yourself. Sorry.

> I'm interested in this work. Is it available on-line, or if not, can you give the name of the journal/conference where it appear in?

It's available online: http://citeseer.ist.psu.edu/appel87garbage.html

Taste

Posted Sep 6, 2007 17:03 UTC (Thu) by IkeTo (subscriber, #2122) [Link] (2 responses)

> It's available online: http://citeseer.ist.psu.edu/appel87garbage.html

Thanks. Just read it briefly. I would not agree that GC is faster than stack allocation because of that, though. I echo Stroustrup's joke that if you have that much memory you are supposed to use them to prevent any process from getting into the swap. =)

Taste

Posted Sep 7, 2007 23:48 UTC (Fri) by lysse (guest, #3190) [Link] (1 responses)

Fair enough, but his opinion was peer-reviewed. :)

Taste

Posted Sep 8, 2007 6:41 UTC (Sat) by IkeTo (subscriber, #2122) [Link]

> Fair enough, but his opinion was peer-reviewed. :)

Peer review process seldom block correct but practically irrelevant work. :)

Taste

Posted Sep 6, 2007 15:51 UTC (Thu) by foom (subscriber, #14868) [Link]

Here's some links:

http://www.lisp.org/HyperSpec/Body/dec_dynamic-extent.html

http://www.sbcl.org/manual/Dynamic_002dextent-allocation....

Taste

Posted Sep 6, 2007 20:39 UTC (Thu) by mikov (guest, #33179) [Link] (7 responses)

In practice, with the current GC implementations and languages, GC is both slow and noticeable. There is no point in arguing this at all, because I experience it every day - in the Java applications I develop, as well as the ones I use.

You could say that I should use another vendor's implementation (e.g. - http://domino.research.ibm.com/comm/research_projects.nsf... - which AFAIK isn't free), or use this or that magical runtime option (naturally after doing heavy profiling), or move to a quad core CPU, but that doesn't change the _default_ situation.

Plus, technically speaking, GC is extremely complex and complexity isn't free.

You cannot have fast concurrent GC without affecting and significantly complicating the generated code - you need to track asynchronous changes, synchronize threads, etc - all very complicated and error prone operations that have a definite cost. It is no accident that there are no accurate concurrent open source collectors. AFAIK the Mono developers are working on something - http://www.mono-project.com/Compacting_GC - but it is not ready yet.

Taste

Posted Sep 8, 2007 1:48 UTC (Sat) by lysse (guest, #3190) [Link] (6 responses)

> You could say that I should use another vendor's implementation (e.g. - http://domino.research.ibm.com/comm/research_projects.nsf... - which AFAIK isn't free)

The last sentence of *that very page* says otherwise. Is the rest of your argument constructed with as much care as this?

Taste

Posted Sep 8, 2007 2:03 UTC (Sat) by mikov (guest, #33179) [Link] (5 responses)

"Metronome Garbage collection is now commercially available under the name WebSphere Real Time. This product is the result of a collaboration between the Metronome team and IBM Software Group. The first implementation of Metronome was in the open-source Jikes Research Virtual Machine (RVM)."

Do I misunderstand the meaning of the words "first implementation was" and "commercially available" ? Besides, if you had bothered to read my post, you'd see that the "freeness" of an implementation is a secondary point. You'd also probably know that extracting a GC from a research project and transplanting it into another JVM is not a mere technical detail.

Judging by the arrogant and dismissive tone of your post, I can only conclude that you have nothing informative to say on this subject. I also dare say that your attitude negates the value of all of your posts on the subject, (which even though I didn't agree with completely I initially found interesting).

Taste

Posted Sep 13, 2007 2:39 UTC (Thu) by lysse (guest, #3190) [Link] (4 responses)

You are free to conclude exactly what you please, of course, but consider what conclusions I might have formed that made me decide to be "arrogant and dismissive". Ironic, n'est-ce pas?

Taste

Posted Sep 13, 2007 3:06 UTC (Thu) by mikov (guest, #33179) [Link] (3 responses)

If you aren't saying anything on the subject of garbage collection, why do you bother replying ? I don't know who you are or what your credentials are, so my opinion of you doesn't matter. And vice versa.

My point is clear enough - garbage collection, as it is experienced in practice in everyday life - is slow. There exist solutions, but they are proprietary and far from common.

Please, try to restrain your apparent urge to be rude and either say why you think my point isn't valid (or what your proposed solution is), or go away.

Taste

Posted Sep 13, 2007 3:26 UTC (Thu) by lysse (guest, #3190) [Link] (2 responses)

I didn't get as far as saying that your point wasn't valid. I said that your fact-checking was laughably unthorough, with the implication that your argument was founded primarily on personal prejudice and you weren't about to let a good fact get in the way. Generally, if I find one claim that's directly negated by the evidence presented in support of it, I don't hang around to see how much of the rest of someone's argument turns out to be a big pile of manure.

And your response to being caught in a falsehood (whether you like it or not, Monotone is unequivocally not non-free; "a commercial version is now available" != "it is not free") was to defend the falsehood and attack your challenger. The former undermines your credibility even further, and gives me no reason to change my initial assessment. The latter has no place in civilised debate, and you should be ashamed of yourself for doing it.

However, the fact that having done so, you then have the hypocrisy and presumption to upbraid me for an "apparent urge to be rude", demonstrates pretty clearly that you *have* no shame. You are not worth communicating woth, frankly, let alone debating. You want to know why I didn't see any good reason for wasting my time on you? Reread your own posts. You've given me no reason to think you're worth a damn, and plenty of cause to decide you aren't - and that's even *before* we consider the merits of your arguments.

I was hoping to avoid telling you exactly what I thought of you, but if you're going to accuse me of rudeness when I am showing restraint, I have nothing to lose. So here it is; I hope it justifies your every prejudice about me, and gives you a few you hadn't thought of. You're not the kind of person I *want* thinking well of me.

Taste

Posted Sep 13, 2007 3:53 UTC (Thu) by mikov (guest, #33179) [Link] (1 responses)

Another post without information. I did not expect that. You don't even have the decency to admit that:
- WebSphere Real Time is not free
- in any case that fact is irrelevant for the point I was making

At least we will both be confident in the knowledge of that.

Go away, troll.

Taste

Posted Sep 13, 2007 11:59 UTC (Thu) by lysse (guest, #3190) [Link]

You're still expecting *decency* from me? Weird.

And the funny thing is, for all your assertions that I'm a troll, *you* initially replied to *me*. And then threw a tantrum when I declined to respect your authoritah.

And now you won't stop, because you Just Have to have the last word, even though the sane thing to do would have been to stop responding at least two comments ago.

In real life, you'd be that kid on the playground who tries to butt into a conversation I'm having with my friends and then starts calling me names and complaining to the teachers because I gave you the brush-off.

And I haven't responded to an argument because you haven't MADE an argument. "Real-world GC is slow in my experience" is an assertion, and a subjective one at that, with a huge great amorphous blob of a term in the middle of it. (I've already mentioned hypocrisy, haven't I? Just checking.) Either put something objective and quantifiable on the table, or take your invisible ball and fuck off back to the infant playground.

Taste

Posted Sep 6, 2007 15:42 UTC (Thu) by foom (subscriber, #14868) [Link] (5 responses)

> ITA goes to great lengths never to trigger a GC cycle, because the first one would
> take longer than restarting the program.
Not true. CMUCL/SBCL's garbage collector isn't the most advanced in the world, but it's most
certainly not *that* bad.

> Therefore, they call out to a huge C++ library to do floating-point calculations,
Not true.

> XML parsing, database lookups
Okay, yes.

> or anything that needs dynamic memory allocation.
Nope.

> They use explicit integer type declarations throughout, and use peculiar idioms known to
> compile to optimal machine code for inner loops.
It's a nice feature of lisp that you can make it very fast when you need to, without changing
languages. (+ x y) can compile into a single machine ADD instruction, if you tell the compiler
that you expect the arguments and result to fit in a machine word. And if you don't tell it that,
your integers can be as big as you like, not limited by hardware.

> Bugs often arise from integer overflows because they don't have a 64-bit integer type.
The compiler will check that the actual type matches the declared type if you're running in lower
optimization modes (which are still fast enough for development and testing)., so it will notice
that and throw an error. So, you can of course write buggy code, but unlike C, integer overflow is
not completely silent.

PS: yes, I work at ITA on this product. It's possible all of the things you say may have been true
when inferior lisp implementations had been used in the past. Maybe one of those was in use
when you worked there.

Taste

Posted Sep 6, 2007 20:06 UTC (Thu) by ncm (guest, #165) [Link] (4 responses)

At the time I was at ITA, they did go to great lengths to avoid ever triggering a GC. (Are you saying QPX runs GC cycles now?) The only integer type that could be used without accumulating garbage was 30 bits. It's one thing to know you're overflowing your ints and quite another to avoid doing it; sometimes you really need more bits. Using floating-point values did accumulate garbage. There was discussion of sponsoring a a 64-bit CMUCL port, which would have offered a bigger fixed-size integer type, and enough address space to tolerate more accumulated garbage, and (maybe?) a native floating-point type. I suppose that port, or the SBCL equivalent, is in use now. Restarting the program once a day while other servers take up the load is an extremely reliable sort of garbage collection, but you need lots of address space to tolerate much accumulated garbage.

Taste

Posted Sep 6, 2007 23:22 UTC (Thu) by foom (subscriber, #14868) [Link] (3 responses)

> (Are you saying QPX runs GC cycles now?)

Yes. With a generational collector, triggering minor GCs is not actually a terrible thing. I'm sure
QPX ran the GC while you were around as well, although, as you say, some effort was put in to try
to avoid it happening very often.

But, as it turns out, the strange things QPX did to avoid allocating new memory for objects that
need to be on the "heap" was actually *slower* than allocating the objects normally and letting
the GC do its thing.

Basically, the GC works fine, and it was a mistake to try to avoid using it. (This mistake wasn't
made because of stupidity or premature optimization, it was an optimization made for another
lisp implementation with a poor GC, and was kept around without re-assessing its necessity
perhaps as soon as should have been done.)

Of course, not allocating memory at all is going to be faster than allocating memory, but when
you do it, a garbage collector is a fine thing to have.

Taste

Posted Sep 7, 2007 0:40 UTC (Fri) by ncm (guest, #165) [Link] (2 responses)

"I'm sure QPX ran the GC while you were around"

Long experience has taught me to be very distrustful of anything a GC advocate is "sure" of.

Evidently ITA's "mistake" in avoiding GC cycles was to insist on running their application for several years before a Lisp runtime with a tolerable GC was available to them. They certainly were not lax in trying to obtain one: they used Allegro CL at the time I started, and dropped it for CMUCL while I was there. SBCL was under active development. They employed one of the primary CMUCL maintainers. (I think he would be surprised to find his competence disparaged here; he was always admirably forthcoming with me in acknowledging CMUCL's then-current and Lisp's inherent limitations.)

This exchange illustrates well some of the reasons why Lisp hasn't exactly taken the world by storm. Chief among them must be Lisp advocates still unable to understand why not.

But this is all off-topic, and I apologize again to those reading the article to learn about system call interfaces.

Taste

Posted Sep 7, 2007 2:43 UTC (Fri) by foom (subscriber, #14868) [Link] (1 responses)

As I said, the mistake was that nobody had properly re-tested the assumption that the GC did not
work acceptably for too long. But, attacking straw men is more fun, right? I'm sorry you have such
an acute dislike for garbage collectors that you need to make things up to prove them terrible.

Lest anyone be confused: CMUCL and SBCL's garbage collectors are essentially identical, and CMUCL
has had a generational garbage collector since...a very long time ago.

While Lisp has most certainly not taken over the world, garbage collectors have.

Taste

Posted Sep 7, 2007 23:53 UTC (Fri) by jsnell (guest, #47245) [Link]

Lest anyone be confused: CMUCL and SBCL's garbage collectors are essentially identical, and CMUCL has had a generational garbage collector since...a very long time ago.

While the code in the two GCs might be essentially identical, that doesn't really mean that their performance characteristics are. There are many important performance improvements in the memory management of sbcl which never made it back to cmucl. Some of those improvements were in the GC, others in related areas like the fast path of the memory allocation sequence. As a result of those, cmucl can take 2x the time sbcl does to run an allocation-heavy program and spend 5x as long in the GC for it [*].

But ultimately those improvements were just tweaks on a 10 year old GC that uses only concepts that are 20 year old, and which was bolted on to a compiler that doesn't really provide any support for the GC. It's not hard to imagine that newer GC designs or ones that are properly integrated into the compiler would perform even better.

[*] Those results are from the Alioth binary-trees benchmark with n=20, since I don't have any better benchmarks accessible right now. Incidentally, in the shootout results the Lisp version of this program is faster than the C one.

Taste

Posted Sep 6, 2007 18:21 UTC (Thu) by vmole (guest, #111) [Link]

Lisp has many qualities that cause people who learn it early in life to love it forever.

Actually, what I think most of the worse-is-better crowd loved was not the "Lisp, the language" (although that is certainly part of it), but the Lisp Machine development environment, which completely blew away the then current C/Unix/Sun3 environment (and still blows away the now current C/Unix/whatever environment.)

For a fun read, I recommend the Unix Hater's Handbook, available as a PDF, and with the preface online: "...What used to take five seconds now takes a minute or two. (But what's an order of magnitude between friends?) By this time, I really want to see the Sun at its best, so I'm tempted to boot it a couple of times." Classic.

typo

Posted Sep 4, 2007 6:29 UTC (Tue) by rasmus (guest, #1728) [Link]

"[...]or bed design decisions". It made more sense for me to read 'bad' here :)

LinuxConf.eu: Documentation and user-space API design

Posted Sep 4, 2007 9:17 UTC (Tue) by mjthayer (guest, #39183) [Link] (8 responses)

If people did not access kernel interfaces directly, but only through shared libraries, that would greatly limit the number of places where changes needed to be made if a kernel ABI was broken, and make it much more feasible to remove unwanted ABIs. The shared libraries need not be maintained by the kernel people, they would just need to accept the responsibility of tracking ABI changes. MS stopped guaranteeing the stability of their kernel ABIs years ago - might the same not work in the Linux world?

Not to mention the added portability benefits of using shared libraries over direct use of ABIs - as shown by Wine :)

Libraries

Posted Sep 4, 2007 9:31 UTC (Tue) by corbet (editor, #1) [Link] (4 responses)

People do access system calls via libraries. You still can't break systems running older libraries, though, so the situation doesn't really change.

Libraries

Posted Sep 4, 2007 9:52 UTC (Tue) by mjthayer (guest, #39183) [Link] (3 responses)

That is currently a conscious decision though. It would equally well be possible to require people upgrading their kernel to upgrade system-critical libraries in sync. Current interfaces carry a stability guarantee, but in theory (read, in my idle speculation :) new ones could be labled as "not guaranteed, only access through a library". And anyone writing such a library would be aware of their responsibility.

In the end, a system which should not be changed can keep with a given kernel.

Libraries

Posted Sep 4, 2007 12:20 UTC (Tue) by nix (subscriber, #2304) [Link] (2 responses)

That way lies alsa-lib, commonly regarded (by seemingly everyone but the ALSA developers) as a really Bad Idea.

Libraries

Posted Sep 4, 2007 13:49 UTC (Tue) by mjthayer (guest, #39183) [Link] (1 responses)

Anything can be done wrong :) (Note that I have never programmed Alsa, so I can't comment there.) However, glibc essentially does the same, with the difference that at least on Linux the underlying interfaces are guaranteed. And unlike Alsa the interfaces and the library are maintained by different people, which might not be such a bad thing.

Libraries

Posted Sep 4, 2007 14:56 UTC (Tue) by nix (subscriber, #2304) [Link]

glibc provides an interface between kernel syscalls and userspace, yes, but interfaces don't appear in glibc until the kernel syscall ssemantics have been nailed in stone, and glibc then maintains those semantics forevermore, using compatibility code to ensure that if necessary (see e.g. the behaviour of nice()).

Requiring a shared library to access the kernel

Posted Sep 4, 2007 14:05 UTC (Tue) by jreiser (subscriber, #11027) [Link] (1 responses)

I have written more than handful of useful software that must access the kernel directly through an absolute system call interface. The interfaces of glibc cannot provide the required services, which include self-virtualization, auto relocation, introspection, control over binding, small size, speed, etc.

The history of glibc with regard to interface stability is not pretty, either. For example: @GLIBC_PRIVATE, hp_timing, _ctype_, *stat(), errno. It's important that both the kernel interfaces and the libc interfaces be well designed and well implemented and well documented.

Requiring a shared library to access the kernel

Posted Sep 5, 2007 0:29 UTC (Wed) by bartoldeman (guest, #4205) [Link]

'info libc' for Glibc 2.6.1 tells me:
This is Edition 0.10, last updated 2001-07-06, of `The GNU C Library
Reference Manual', for Version 2.3.x of the GNU C Library.
which does not look very promosing.

It would be great if someone could fund a technical writer to work on this manual... the POSIX threads documentation still talks about Linuxthreads instead of NPTL, and many of the new functions mentioned in NEWS are not documented at all, or documented elsewhere.

For instance, ppoll(2) is in the man pages but not in the glibc manual, and there are many others.

I usually try to check both man pages and info to be sure.

LinuxConf.eu: Documentation and user-space API design

Posted Sep 8, 2007 20:55 UTC (Sat) by jzbiciak (guest, #5246) [Link]

Two words: Static linking.

Lax software development in Linux

Posted Sep 4, 2007 12:43 UTC (Tue) by jreiser (subscriber, #11027) [Link]

Michael presented three examples to show how the process of writing documentation can turn up bugs.

In each case there were no unit tests (by the orginal implementors or anyone else) and no coverage by Linux Test Project, yet the software was accepted. No one should be surprised at the shoddy results. The software must stay in the -mm tree (or other staging area) until it is accompanied by a testcase which exercises each feature and error condition.

LinuxConf.eu: Documentation and user-space API design

Posted Sep 4, 2007 19:37 UTC (Tue) by njs (guest, #40338) [Link] (2 responses)

Surely it would not be too much to ask people submitting patches to write, if not documentation in English, at least documentation in code?

I mean, they're testing their patch somehow, usually with some ugly hacked up code. It would not be dramatically more work to just require them to additionally meet basic code cleanliness standards, exercise the full interface, and always send in the code? Then at least one person has had to bang their head against using the interface, and there is some kind of full specification written down (even if not in the most convenient form)...

LinuxConf.eu: Documentation and user-space API design

Posted Sep 4, 2007 23:05 UTC (Tue) by arnd (subscriber, #8866) [Link] (1 responses)

One point that Michael made in his talk was that it's useful to have the
documentation written by somebody other than the author of the code, in
order to increase the chances of finding bugs in the process.

Of course the requirement of having a documentation for everything is a
very good idea nonetheless and having both written by the same person can
only be better than no documentation at all.

Another problem we still need to work on is documentation for all the
existing code -- it's hard to make documenting new stuff a requirement
when there are so many examples of where we haven't done the documentation
in years. This is even more true for kernel internal interfaces than for
user APIs.

LinuxConf.eu: Documentation and user-space API design

Posted Sep 5, 2007 13:12 UTC (Wed) by njs (guest, #40338) [Link]

>One point that Michael made in his talk was that it's useful to have the documentation written by somebody other than the author of the code, in order to increase the chances of finding bugs in the process.

Sure. But everyone also keeps saying that schemes that involve too much overhead won't fly -- and reasonably so, it's already very hard to find patch reviewers, requiring patch submitters find someone to write up docs for them before their patch can be accepted may just be unworkable. So I was wondering if one could get 80% of the benefit with 5% of the work.

(Note that I'm only suggesting the patch author write up example code, documentation proper could well still be done by someone else.)

>Another problem we still need to work on is documentation for all the existing code -- it's hard to make documenting new stuff a requirement when there are so many examples of where we haven't done the documentation in years. This is even more true for kernel internal interfaces than for user APIs.

Don't know if I believe this... it's totally common for projects to say "hey, we used to do things such-and-such way, we've realized it was a bad idea, from now on we're doing them differently" and to grandfather in the old stuff in the process. And internal interfaces are both less stable and more aimed at experts, so the documentation/design problems are far less urgent.

LinuxConf.eu: Documentation and user-space API design

Posted Sep 5, 2007 5:55 UTC (Wed) by lacostej (guest, #2760) [Link] (1 responses)

It looks like issues are detected within a time frame or +2 releases. Some of the issues reported here also look like no-one really used them, otherwise they would have had their issues revealed.

Could it help to state that a particular system call released in stable n release isn't considered stable until n+2 ? and that we allow breaking it in that time-frame ? That's probably just pushing the problem...

Better would be to wait until it has n reported users (n>=3). So what about only adding new APIs to the kernel until enough people have used them. That should also increase the testing of the non official stable trees.

LinuxConf.eu: Documentation and user-space API design

Posted Sep 5, 2007 6:29 UTC (Wed) by mkerrisk (subscriber, #1978) [Link]

Could it help to state that a particular system call released in stable n release isn't considered stable until n+2 ? and that we allow breaking it in that time-frame ? That's probably just pushing the problem...
This is an idea that has been getting some consideration. It might help matters a little, but it causes other types of pain (e.g., a new interface becomes moderately used by a number of apps, but then a critical design issue causes a interface change at n+2; or, userland developers refrain from using an interface until n+2, because they know it might change).

My hypothesis is that we could get a lot of the benefit, and avoid the sorts of pain I just described, if we could improve and rigorously apply a documentation and testing process for new interfaces (i.e., a process that occurs in parallel with the implementation of the interface, and is completed by the time of initial stable release, rather than after the fact).

Once upon a time (;-))

Posted Sep 12, 2007 0:14 UTC (Wed) by davecb (subscriber, #1574) [Link] (1 responses)

When one wanted to provide a chunk of functionality
to Multics, one wrote a white paper arguing its
desirability. This was Good. Then we wrote
tutorial and manual pages, because we were
writing, ayway.

The Unix folks from Bell labs, who worked on
Multics, decide that, when developing Unix,
writing man pages was Almost As Good. As
were tutorials.

All joking aside, how about a requirement that
one write a white paper, a man page, **and** a
tutorial before you can add a new feature
to the Linux or BSD kernel?

--dave

Once upon a time (;-))

Posted Sep 13, 2007 18:17 UTC (Thu) by larryr (guest, #4030) [Link]

how about a requirement that one write a white paper, a man page, **and** a tutorial before you can add a new feature

Is it ok if it says

xyz is fully documented in the texinfo documentation. To access the help from your command line, type info xyz

Larry


Copyright © 2007, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds