|
|
Subscribe / Log in / New account

Glibc change exposing bugs

People experiencing sound corruption or other strange bugs on recent distribution releases may want to have a look at this Fedora bugzilla entry. It seems that the glibc folks changed the implementation of memcpy() to one which, in theory, is more highly optimized for current processors. Unfortunately, that change exposes bugs in code where developers ignored the requirement that the source and destination arrays passed to memcpy() cannot overlap. Various workarounds have been posted, and the thread includes some choice comments from Linus Torvalds, but the problem has been marked "not a bug." So we may see more than the usual number of problems until all the projects with sloppy mempcy() use get fixed. (Thanks to Sitsofe Wheeler).

to post comments

Flash plugin

Posted Nov 10, 2010 19:13 UTC (Wed) by cesarb (subscriber, #6266) [Link] (3 responses)

I find it interesting that the issue was found on the flash plugin. Out of the thousands of commonly used packages, did only flash get it wrong? Or was it just bad luck of being the first to get hit by the change?

Flash plugin

Posted Nov 10, 2010 19:20 UTC (Wed) by jwb (guest, #15467) [Link]

It is probable that Flash Player is the worst piece of widely-deployed software on Linux clients by a huge margin.

Flash plugin

Posted Nov 11, 2010 0:37 UTC (Thu) by marcH (subscriber, #57642) [Link]

> I find it interesting that the issue was found on the flash plugin.

Steve Jobs at work in glibc?

Flash plugin

Posted Nov 11, 2010 19:28 UTC (Thu) by iabervon (subscriber, #722) [Link]

Out of the thousands of commonly used packages, almost all have always had some user on a platform that doesn't handle overlapping ranges to memcpy. Flash is essentially the last to get hit by the change: everybody else hit the issue earlier on platforms that don't get much press.

Glibc change exposing bugs

Posted Nov 10, 2010 19:25 UTC (Wed) by brunowolff (guest, #71160) [Link] (4 responses)

Actually I think we may have first seen this with squashfs. Problems showed up right before the F14 alpha. Phillip found the cause of the problem was using memcpy instead of memmove.

Glibc change exposing bugs

Posted Nov 10, 2010 19:52 UTC (Wed) by brunowolff (guest, #71160) [Link] (3 responses)

Glibc change exposing bugs

Posted Nov 11, 2010 4:12 UTC (Thu) by plougher (guest, #21620) [Link] (2 responses)

Thanks Bruno,

The Redhat bugzilla doesn't have a description of the memcpy problem that ultimately caused the bug, but my Squashfs CVS commit does

http://squashfs.cvs.sourceforge.net/viewvc/squashfs/squas...

From my experience hitting this problem I think the people insisting this is merely bad programmers using memcpy where they should have used memmove are missing the point. I quite legitimately wrote code using memcpy where it was known the areas did not overlap, but over the years code changes elsewhere happened which then caused the areas to overlap in certain circumstances, breaking design assumptions made in years old code. This was obviously a bug, but one which hitherto had been completely hidden by the behaviour of memcpy.

In other words programmers can make well meaning mistakes especially when dealing with old code or with library routines where the underlying implementation is not known. Testing with the old behaviour of memcpy won't show anything is amiss.

Flash is obviously an example where memcpy of overlapping areas occurs frequently and so it has shown up quite quickly. There may be many applications using memcpy which in rare circumstances use overlapping areas, leading to unexplained corruption and data loss, which have not yet shown up.

Glibc change exposing bugs

Posted Nov 12, 2010 6:55 UTC (Fri) by hozelda (guest, #19341) [Link]

>> where the underlying implementation is not known

I'm not disagreeing with the gist of what you mentioned, but this does point out that software is very complex in that the exact semantics of every interacting component has compounding effects on the overall result. This is why there is a tendency to code to achieve a pass on tests rather than by strict well defined "interface specs". However, having access to source code and working openly means problems can be identified quicker and shared quicker with other projects. If source code was not available (eg, if the library change had happened on a proprietary platform), finding the mistake would be more difficult and costly and there would be more bugs that would only come out under odd scenarios because thorough testing is impossible and certainly more difficult than analyzing even lots of source code.

An ideal wish: I want to see code assumptions be documented better on source code (assert calls and/or prose), even though we do have access to version control, peer review, and sometimes lots of testing with decent feedback. HTML can be generated from a heavily documented code to exclude all the little comments except when you want to see them (eg, before a final release or for inspection/audits). Trying to keep the source clean makes it easier to end up with problems over time. We would benefit from extensive standards for documentation, and those that like simple tools (like a simple text editor) can run simple filters on the project from the make file so that you can have all those notes not pollute a working copy.

[An extensive test suite designed to catch these problems is similar in effect but will leave holes whereas descriptive text can offer an important layer of defense.]

[Many times when you are (I am) learning a new code base, you have to make these notes anyway. Why not just formalize the effort and keep it together with the other code? We should even be able to have tests run from this documentation. Tagging precise spots can be done using any "unique" delimiter and can take an sgml approach. Then git and other tools can identify "conflicts" not just from 3-way merges with overlaps but from simple edits which overlap with a comment's scope.. triggering a requirement to update the comments whose scope were touched.]

[Another approach would be to try and keep something like a git branch of such comments/tests in sync and require that it be run before accepting commits on the main clean branch.]

[It might just be too much effort to do this in terms of bang for buck. Comments can very easily grow stale, though, that is why I suggested automatic conflict identification efforts within the workflow.]

Glibc change exposing bugs

Posted Nov 26, 2010 13:37 UTC (Fri) by SEJeff (guest, #51588) [Link]

cvs for a filesystem? wow

Glibc change exposing bugs

Posted Nov 10, 2010 19:40 UTC (Wed) by clugstj (subscriber, #4020) [Link] (98 responses)

I think I have to respectfully disagree w/ Linus. The first paragraph of the manpage tells you specifically NOT to do this, I have to agree w/ the GlibC folks that this is not a bug (in GlibC).

Glibc change exposing bugs

Posted Nov 10, 2010 19:46 UTC (Wed) by rodgerd (guest, #58896) [Link] (90 responses)

After the o_ponies poo-flinging from kernel developers in the direction of app developers, it's pretty funny seeing the lead kernel developer complaining about... code conforming to it's documented behaviour.

Glibc change exposing bugs

Posted Nov 10, 2010 20:03 UTC (Wed) by corbet (editor, #1) [Link] (37 responses)

That's a germane example, actually. "Poo flinging" notwithstanding, the kernel developers fixed things so that applications would not lose data even if they weren't following standard behavior. Not breaking things was seen as more important than doing something because the posted rules say you can.

I don't believe that Linus (or anybody else) is saying that the broken applications are not buggy. What I'm hearing is that those applications have worked for years and that people should think for a long time before introducing a change which breaks them. Thus, Linus asks: what's the benefit that justifies such a change? I think it's a reasonable question.

Glibc change exposing bugs

Posted Nov 10, 2010 20:10 UTC (Wed) by jwb (guest, #15467) [Link] (18 responses)

There are a huge variety of improvements to Linux which have broken or will break Flash Player, for example Flash abused the ALSA API until Pulse came along and exposed that abuse.

Glibc change exposing bugs

Posted Nov 10, 2010 20:38 UTC (Wed) by neilbrown (subscriber, #359) [Link] (17 responses)

This all sounds like a very strong recommendation in favour of Rusty Russell's Maxim of API development: APIs should be hard to misuse. memcpy, and apparently ALSA, are easy to misuse.

So implementing memcpy as memmove - which Linus says in the bugzilla threads is largely what the kernel does - sounds very sensible. memmove is much harder to misuse.

Glibc change exposing bugs

Posted Nov 13, 2010 1:07 UTC (Sat) by rriggs (guest, #11598) [Link] (3 responses)

memmove: safe, fast, verbose function name
memcpy: unsafe, at least as fast as memmove, one less character to type

Which one do you think your average C programmer will choose?

Which one do you think new programmers are taught to use (in schools that still teach C programming)?

Glibc change exposing bugs

Posted Nov 13, 2010 2:52 UTC (Sat) by neilbrown (subscriber, #359) [Link] (1 responses)

So we can save the world by creating a 'memmv' in glibc which aliases memmove? Brilliant!

Glibc change exposing bugs

Posted Nov 15, 2010 16:43 UTC (Mon) by renox (guest, #23785) [Link]

Too late!

And memcpy should also be named as mem_unsafe_copy, but yes if you tell developers to use safe function by default and to optimize only when they can show benchmarks that the optimisation will make a difference, then yes, you'd get probably better software (if a bit slower).

Glibc change exposing bugs

Posted Oct 17, 2013 12:49 UTC (Thu) by jzbiciak (guest, #5246) [Link]

You're calling memmove verbose as compared to memcpy? Even Ken Thompson said if he had it to do over, he'd spell creat() with the final 'e'.

Glibc change exposing bugs

Posted Nov 25, 2010 15:13 UTC (Thu) by Spudd86 (guest, #51683) [Link] (2 responses)

It's not so much that ALSA is easy to misuse (although it probably is), it's that certain parts of it are impossible to emulate from userspace. An app that actually NEEDS those parts is NOT misusing the API when it uses them (for example, pulseaudio itself uses those bits).

The problem is that most apps don't actually need the those bits, so they just needlessly break software like pulseaudio (and also break on bluetooth audio too).

Pulseaudio does use those unemulatable APIs, but it also falls back if they don't work, and it has good reasons to use those APIs (so it can hand over large chunks of audio data, but still be able to decide it wants to change that same data later (if for example something else starts playing audio), this saves you power because pulse won't wake your CPU as much, but it also uses APIs that don't emulate well AND until pulse came along nobody ever tried to do that sort of thing so it broke)

Glibc change exposing bugs

Posted Nov 25, 2010 16:07 UTC (Thu) by foom (subscriber, #14868) [Link] (1 responses)

If the documentation wasn't so terrible, this probably wouldn't be a problem. It doesn't give *any* clue that, for example, a developer shouldn't use the mmap functions. In fact it makes it sound like you should use them, because they're zero-copy (and that's better, right?)

Glibc change exposing bugs

Posted Nov 25, 2010 16:39 UTC (Thu) by Spudd86 (guest, #51683) [Link]

Well yea, but Lennart Pottering does have a blog post where he says exactly what subset of ALSA's API you should restrict yourself to, perhaps someone should put that into the ALSA docs.

Glibc change exposing bugs

Posted Oct 17, 2013 12:42 UTC (Thu) by jzbiciak (guest, #5246) [Link] (9 responses)

One major reason the remaining distinction between memcpy and memmove exists in the standard seems to be this:

To write memmove completely within conformant C, you need a malloc and a double-copy. That's because in that mythical Platonic ideal of a language, you cannot compare two pointers that do not point into the same object, and you are not guaranteed that the arguments to memmove point within the same object. That is, a fully compliant memmove would look something like this:

    void *memmove(void *dst, const void *src, size_t len)
    {
        char *srcc = (char *)src;
        char *dstc = (char *)dst;
        char *temp = malloc(len);
        size_t i;

        /* What if 'malloc' fails?  call abort()?  Unspecified! */

        for (i = 0; i != len; i++)
            temp[i] = srcc[i];

        for (i = 0; i != len; i++)
            dstc[i] = temp[i];

        free(temp);
        return dst;
    }

And, on 16-bit segmented computers or other computers lacking flat memory spaces, both of which are rather from a Platonic ideal, comparing two pointers isn't always as straightforward as you might like. So practically, memcpy offers some noticeable performance benefits on those machines.

Yes, I'm aware that the actual language in the standard says 'as if' the source was first copied to a temporary array. But, as I recall, a fully conformant C program has no other option. The 'as-if' clause allows library writers to avoid such shenanigans, without requiring them to do so. So much hair-splitting...

If it weren't for that, you could make the argument that separate memcpy and memmove were historical accidents, and change the C standard at some point to remove the restrictions on memcpy to make them both equivalent. That new memcpy would then adhere to Rusty's Maxim, or at least come much closer. And, from the thread linked above, that's pretty much what BSD did, it sounds like.

As a half step, you could define memcpy as always copying forward, to make "sliding down" safe, but that just seems a little goofy for a number of reasons.

I'm personally with Linus that the glibc breakage seems gratuitous. I'd lean towards making memcpy and memmove equivalent if their performance is largely indistinguishable. Arguing that the software is broken when it worked for year with the old library reminds me of this silly meme. It's the kind of hair-splitting that only a bureaucrat or chapter-verser could love.

Glibc change exposing bugs

Posted Oct 17, 2013 12:44 UTC (Thu) by jzbiciak (guest, #5246) [Link] (8 responses)

...rather far from a Platonic ideal...

Need. More. Coffee.

Glibc change exposing bugs

Posted Oct 18, 2013 13:31 UTC (Fri) by meuh (guest, #22042) [Link] (7 responses)

... or a time machine to go back in 2010 ...

If we were on "stackoverflow", you would have earned the "Necromancer" badge ;)

Glibc change exposing bugs

Posted Oct 18, 2013 14:08 UTC (Fri) by jzbiciak (guest, #5246) [Link] (6 responses)

Yeah, I was up late and followed a link into the ancient thread. The next morning, I resumed reading, forgetting I was in a 3 year old thread. Ah well. :-)

Glibc change exposing bugs

Posted Oct 21, 2013 20:37 UTC (Mon) by nix (subscriber, #2304) [Link] (5 responses)

Your comment was interesting anyway. This is the relevant guarantee from C89 (C99 and C11 have similar wording):
If two pointers to object or incomplete types compare equal, they point to the same object. If two pointers to functions compare equal, they point to the same function. If two pointers point to the same object or function, they compare equal. If one of the operands is a pointer to an object or incomplete type and the other has type pointer to a qualified or unqualified version of void , the pointer to an object or incomplete type is converted to the type of the other operand.
The problem here is that this does not guarantee that two pointers to the same object always compare equal, but rather that if they compare equal, they are pointers to the same object (and similarly for comparison operators). We can tell if two pointers definitely are pointers within the same object, but if the comparison fails we cannot conclude anything. This is unfortunately the opposite of the guarantee that memmove() needs if it is to transform itself into a memmove() when needed, so (in the absence of a Standard-blessed way to normalize pointers) you are indeed forced to do a double-copy at all times when writing memmove() in Standard C.

Glibc change exposing bugs

Posted Oct 21, 2013 20:49 UTC (Mon) by khim (subscriber, #9252) [Link] (4 responses)

This is unfortunately the opposite of the guarantee that memmove() needs if it is to transform itself into a memmove() when needed, so (in the absence of a Standard-blessed way to normalize pointers) you are indeed forced to do a double-copy at all times when writing memmove() in Standard C.

Note that in real world there are no such guarantee (hint, hint) thus GLibC's memmove sometimes works and sometimes does not work.

Glibc change exposing bugs

Posted Oct 23, 2013 14:23 UTC (Wed) by nix (subscriber, #2304) [Link] (3 responses)

You appear to have read what I said exactly backwards. Of *course* C on Unix conforms to the guarantee that pointers that compare equal will point to the same object! What you can do with mmap() is produce two pointers that do *not* compare equal but which nevertheless point to the same object. This is exactly what torpedoes a fast auto-reducing-to-memcpy() memmove() implementation, since there is no way to efficiently tell if two pointers point into the same aliased region: even modifying the region via one pointer and probing via the other won't work because they could be pointing at different parts of the aliased region rather than e.g. at the start of it (you are not restricted to call memcpy()/memmove() on pointers returned from malloc(): you can copy parts of objects, and the like).

This behaviour is explicitly permitted by the Standard: segmented architectures like MS-DOS were like this decades ago. The guarantee that a == b returns nonzero only when a and b are pointers to the same object holds nonetheless. It's just a less useful guarantee than we might like.

Glibc change exposing bugs

Posted Oct 23, 2013 15:47 UTC (Wed) by khim (subscriber, #9252) [Link] (2 responses)

My point was that real-world GLibC-implemented memmove does not actually work when used on POSIX system. It compares pointers and assumes that if they are different then underlying memory is also different!

Which means, strictly speaking, that memmove in GLibC is not standards-compliant :-)

Glibc change exposing bugs

Posted Oct 23, 2013 16:47 UTC (Wed) by jzbiciak (guest, #5246) [Link]

From my perspective, the two of you are in violent agreement. :-)

Glibc change exposing bugs

Posted Oct 23, 2013 18:09 UTC (Wed) by nix (subscriber, #2304) [Link]

What? Surely not...

... bloody hell, it does. Or many of the assembler versions do anyway. Or, rather, it assumes that distinct addresses cannot alias.

I suppose this is probably safe in practice, because if you *do* use mmap() to set up aliased regions at distinct addresses you are suddenly in hardware-dependent land (due to machines with VIPT caches such as, IIRC, MIPS, not being able to detect such aliasing at the caching level, so you suddenly need to introduce -- necessarily hardware-dependent -- cache flushes and memory barriers) so you have to know what you're doing anyway, and little things like memmove() falling back to memcpy() at unexpected times are things you're supposed to know about.

I hope.

Glibc change exposing bugs

Posted Nov 10, 2010 20:56 UTC (Wed) by clugstj (subscriber, #4020) [Link] (8 responses)

Even if there isn't currently-demonstrable benefit, there could be in the future, so why not get the buggy code fixed now instead of later? Yes, it's a very fine line, but, just my opinion, I don't have a problem w/ GlibC not reverting the change.

Glibc change exposing bugs

Posted Nov 10, 2010 22:24 UTC (Wed) by lmb (subscriber, #39048) [Link] (6 responses)

Because the user whose data has just been corrupted or whose important business meeting presentation just crashed or whose mail has been eaten no longer cares, and has switched to a less hostile platform.

That behavior is undefined makes one only right as far as technicality is concerned; it does not imply that changing it silently is good software engineering practice, nor that it is right in terms of software providing a service to users.

Glibc change exposing bugs

Posted Nov 10, 2010 23:58 UTC (Wed) by nix (subscriber, #2304) [Link] (4 responses)

But the compiler makes undefined stuff break all the time, and the set of undefined stuff which is broken is changed by all sorts of things. Nobody complained when LTO came in, although it surely broke programs relying on numerous instances of undefined behaviour which had been harmless before due to wider optimization opportunities when optimizing across translation units. So why complain about this? Just because Flash was affected?

Glibc change exposing bugs

Posted Nov 11, 2010 2:29 UTC (Thu) by foom (subscriber, #14868) [Link] (3 responses)

Because changes in the compiler don't break already-installed working binaries. They break newly compiled versions of software. Presumably such newly-compiled software gets tested, and if there's a problem, the program is perhaps recompiled with an older version of the compiler until the issue is fixed.

Glibc change exposing bugs

Posted Nov 11, 2010 7:29 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

That's a very large assumption indeed. When glibc gets recompiled, is everything on the distro tested? When libpng gets recompiled (every other week), is everything that uses it tested? I doubt it.

Glibc change exposing bugs

Posted Nov 11, 2010 17:30 UTC (Thu) by foom (subscriber, #14868) [Link] (1 responses)

Shrug, yet still, even if it was only discovered sometime later...if there was a bug in the new libpng binary that only appeared because it was compiled with a new gcc, it's still a bug in the new libpng binary that can be fixed by uploading yet another new libpng binary.

Here we have a new bug in flash which appeared without a new version of the flash binary being uploaded. It's a substantively different situation.

Glibc change exposing bugs

Posted Nov 12, 2010 7:12 UTC (Fri) by hozelda (guest, #19341) [Link]

If you update your libpng then the corruption already happened just as if you update glibc. [But the odds grow problems will arise when you update glibc because of its vast use]

If you are that worried, you should work off stable versions or off a stable distributor that will manage this for you. You should not change key parts of the system if possible. glibc is a very key part. You should not update for optimizations, at least not without significant tests and only if you think it's worthwhile the gains. Stick to security updates or when a crucial problem has been solved.

Anyway, when an important "bug" like this comes up, projects should audit the code. In this case, the possible entry points to potential problems can be identified quickly for many projects (just search for memcpy).

The case of glibc involves well-defined standards. Most libraries do not have such carefully defined semantics, and we must rely on access to source code for the juicy bits.

OK, despite what I just said, if the gains here are not that useful, glibc should revert, at least for the time being. Reverting should not hurt those that adjusted already and will save those that have not. On the other hand, when will be the right time to change? Will people remember to fix this problem or will we just have a repeat later on? [Again, if the gains are negligible, the change in glibc should probably be avoided.]

Glibc change exposing bugs

Posted Nov 11, 2010 0:50 UTC (Thu) by MattPerry (guest, #46341) [Link]

> That behavior is undefined makes one only right as far as technicality
> is concerned;

But it is defined. The man page says not to use that function on overlapping regions. That applications ignored that and still functioned for so long is more a matter of good luck. That luck has run out due to their poor implementations and they should now be fixed.

Glibc change exposing bugs

Posted Nov 11, 2010 1:15 UTC (Thu) by Lovechild (guest, #3592) [Link]

Perhaps if somehow one could get it to emit a warning message instead of crashing that might work. For now it might be best done in testing environments such as being enabled perhaps during the development cycle of a distribution.

Glibc change exposing bugs

Posted Nov 11, 2010 2:41 UTC (Thu) by quotemstr (subscriber, #45331) [Link] (8 responses)

the kernel developers fixed things so that applications would not lose data even if they weren't following standard behavior
What some filesystem developers propose applications do isn't defined by any standard. POSIX, SuS, and so on don't state what happens after a crash, fsync() or not. The argument was over what to do in certain circumstances outside any standard. The argument was must muddled because one said kept claiming that its brand of brain damage was endorsed by the standard. Fortunately, sanity prevailed. Calling fsync() after every rename would have inconvenienced application developers and decreased performance.

memcpy, on the other hand, is clearly described by the relevant standards. Application developers deserve what they get.

Glibc change exposing bugs

Posted Nov 11, 2010 8:07 UTC (Thu) by bojan (subscriber, #14302) [Link] (7 responses)

> The argument was must muddled because one said kept claiming that its brand of brain damage was endorsed by the standard.

He, he... Nice try :-)

Nothing could be further from the truth. The problem is that the standard doesn't _specify_ in which order things should happen on the underlying FS, which then gives implementers the ability to implement _any_ order (which they do). Relying on a _particular_ order (which is completely undocumented, of course) by application writers is the problem.

Suggestion about specification not dealing with crashes is irrelevant, because, once again, it doesn't specify _any_ behaviour. In other words, if you FS is hosed completely after a crash, that OK. If it's half hosed, that's OK too. If it's completely OK, that's OK as well. Obviously, the _interesting_ case is when it's completely OK, in which case the _implemented_ ordering actually makes a difference. And, once again, _any_ ordering is OK, because the standard specifies _none_.

The only difference between this and the memcpy() fiasco is that in the case of rename() folks may get an _impression_ that the operation is atomic on the FS level, because it is atomic as viewed from the processes currently running on the system. Of course, this is documented nowhere, but is a common misreading of the standard.

With memcpy() it is quite clear overlapping regions should be copied with memmove().

Glibc change exposing bugs

Posted Nov 11, 2010 8:32 UTC (Thu) by Mook (subscriber, #71173) [Link] (6 responses)

Funnily enough... that has to do with glibc too. In particular, its manual on rename(): http://www.gnu.org/s/libc/manual/html_node/Renaming-Files...

Yes, glibc's rename() API guarantees atomic renames. Since normal applications do not make syscalls directly, but call the libc API to do it on their behalf, they are not to blame.

Glibc change exposing bugs

Posted Nov 11, 2010 8:46 UTC (Thu) by bojan (subscriber, #14302) [Link] (5 responses)

And even more "funily", glibc doesn't deal with file system implementation (i.e. the persistence of the change) at all. In fact, that very page you pointed to states that strange things may indeed happen after a crash.

The atomicity of rename() refers to a view from the running system and not much else. But it has sure been misread a lot :-)

Glibc change exposing bugs

Posted Nov 11, 2010 9:06 UTC (Thu) by Mook (subscriber, #71173) [Link] (4 responses)

Hmm, odd; I parse "If there is a system crash during the operation, it is possible for both names to still exist; but newname will always be intact if it exists at all. " as "the file named by the destination will either not exist, or have some sort of sensible value, but not be truncated at zero bytes unless that was one of the two inputs".

Glibc change exposing bugs

Posted Nov 11, 2010 9:52 UTC (Thu) by bojan (subscriber, #14302) [Link] (3 responses)

You are confusing file names (i.e. what is recorded in the directory) with contents of files.

Glibc change exposing bugs

Posted Nov 11, 2010 13:49 UTC (Thu) by pbonzini (subscriber, #60935) [Link] (2 responses)

"intact" seems to refer to the contents?

Glibc change exposing bugs

Posted Nov 11, 2010 23:05 UTC (Thu) by bojan (subscriber, #14302) [Link] (1 responses)

Suppose there are two entries in the directory, with oldname being renamed to newname, and each (obviously) pointing to an inode. If the system crashes during the rename, it is possible that both will survive (because the directory was not committed to disk yet).

What glibc docs are talking about is that rename() is not implemented by copying content of the oldname to newname. So, if there was newname before rename and the directory commit doesn't go through, the content of newname will not be changed. It is a pure directory operation. On the other hand, if the directory gets committed, there will be just newname there, pointing to whatever content oldname had. All of that is if your FS knows how to survive a crash - otherwise situation is not interesting (well, unless you're the sysadmin recovering the mess :-).

Now note the situation from the ext4 "problem". The oldname content was not fsync()-ed to disk before the rename(). Ergo, when the directory got committed, oldname became newname on disk, pointing to zero bytes, due to delayed allocation. This has nothing to do with the fact that on unsuccessful (i.e. not committed before the crash) rename(), both oldname and newname would remain in the directory.

Glibc change exposing bugs

Posted Nov 12, 2010 7:12 UTC (Fri) by Mook (subscriber, #71173) [Link]

Thank you for the clear explanation! It does clearly say that I'm wrong :)

Glibc change exposing bugs

Posted Nov 10, 2010 20:07 UTC (Wed) by dlang (guest, #313) [Link] (51 responses)

Linus is being very consistant here.

if a userspace program does things that have been working, even if they weren't supposed to work, that's part of the ABI of the kernel and he is very reluctant to change anything, and will only do so when there is a _very_ compelling reason

Glibc change exposing bugs

Posted Nov 10, 2010 20:36 UTC (Wed) by JoeBuck (subscriber, #2330) [Link] (50 responses)

The existing memcpy implementation did copying in a forward direction, so it would give a wrong result for memcpy(buf, buf + 4, 8) but the expected result for memcpy(buf, buf - 4, 8). The change (in at least some circumstances) does the reverse, and both ways satisfy the spec, which says that src and dst must not overlap, and if they might, memmove should be used. Linus is apparently calling for the original implementation decision (forward, not backward) to be set in stone, even if a backward-copy might be faster on a particular processor. This doesn't seem right to me. However, it seems reasonable to provide a cleaner workaround until old code can be fixed (it might just be a cleaned-up version of his proposed LD_PRELOAD trick).

An alternative LD_PRELOAD, pointing to a memcpy that crashes for overlapping arguments, could be used to expose accidental misuse of the API.

Glibc change exposing bugs

Posted Nov 10, 2010 20:51 UTC (Wed) by clugstj (subscriber, #4020) [Link]

I think that history has shown that old code won't get fixed until it actually manifests itself as broken - at least in the commercial world.

Please don't attack strawmen. Thnx.

Posted Nov 10, 2010 22:47 UTC (Wed) by khim (subscriber, #9252) [Link] (2 responses)

The actual cite:

So in the kernel we have a pretty strict "no regressions" rule, and that if people depend on interfaces we exported having side effects that weren't intentional, we try to fix things so that they still work unless there is a major reason not to.

...

Regardless, it boils down to: we know the glibc change resulted in problems for real users. We do _not_ know that it helped anything at all.

Linus is Ok with changes that break buggy programs (it happened before, it'll happen again) bit only if there are "major reason". What's the justification for this particular case?

Please don't attack strawmen. Thnx.

Posted Nov 10, 2010 23:17 UTC (Wed) by bojan (subscriber, #14302) [Link] (1 responses)

> What's the justification for this particular case?

Linus couldn't play his favourite YouTube videos ;-)

Please don't attack strawmen. Thnx.

Posted Nov 11, 2010 1:31 UTC (Thu) by jonabbey (guest, #2736) [Link]

Ah! Andreas was trying to get Linus to quit wasting time on YouTube and get back to kernel development.

It's not, in fact, a bug. It's a feature.

Glibc change exposing bugs

Posted Nov 10, 2010 23:15 UTC (Wed) by charlieb (guest, #23340) [Link] (45 responses)

> ... and both ways satisfy the spec, which says that src and dst must
> not overlap, ...

Does it? The man page says:

The memory areas should not overlap.

It does not say:

The memory areas must not overlap.

It also says:

The memcpy() function copies n bytes from memory area src to
memory area dest.

It doesn't say:

The memcpy() function copies n bytes from memory area src to
memory area dest, unless the memory areas overlap.

"should" provisions are not mandatory. Unless you decide to redefine the terminology.

Glibc change exposing bugs

Posted Nov 10, 2010 23:23 UTC (Wed) by bojan (subscriber, #14302) [Link]

Nice exercise in verbal gymnastics. However, you forgot:

> Use memmove(3) if the memory areas do overlap.

Glibc change exposing bugs

Posted Nov 10, 2010 23:24 UTC (Wed) by donwaugaman (subscriber, #4214) [Link] (42 responses)

Hmm... the man page on my desktop computer (RHEL4) says:

If copying takes place between objects that overlap, the behavior is undefined.

In the context of standardese, that specifies that exactly anything can happen in the event of overlapping memory areas, with no 'should' or 'must' about it. The standard doesn't set down any rules that a developer must follow, only what will happen under certain conditions (in this case, the result is 'anything').

'must' and 'should' are more in the vein of RFCs.

Glibc change exposing bugs

Posted Nov 11, 2010 0:25 UTC (Thu) by nicooo (guest, #69134) [Link] (38 responses)

On my system there are two man pages, one from POSIX and one from the linux man-pages project.

Glibc's info page says it's undefined. It's the official documentation but nobody uses info.

Glibc change exposing bugs

Posted Nov 11, 2010 0:28 UTC (Thu) by bojan (subscriber, #14302) [Link]

And both clearly state that if regions overlap, one should use memmove().

Glibc change exposing bugs

Posted Nov 11, 2010 0:33 UTC (Thu) by charlieb (guest, #23340) [Link]

> On my system there are two man pages, one from POSIX and one from
> the linux man-pages project.

Ideally the linux man-page will be clarified. "should" there seems just a recommendation. Not "your software will eat babies unless you do this".

Glibc change exposing bugs

Posted Nov 11, 2010 2:42 UTC (Thu) by butlerm (subscriber, #13312) [Link] (35 responses)

It's the official documentation but nobody uses info.

That is because 'info' is user hostile and dangerously close to useless. A web search is a dozen times faster than navigating an info document.

Glibc change exposing bugs

Posted Nov 11, 2010 6:41 UTC (Thu) by HelloWorld (guest, #56129) [Link] (25 responses)

What's useless isn't info, but man, at least for documentation spanning more than a couple of pages (like for gcc or mplayer).

Glibc change exposing bugs

Posted Nov 11, 2010 9:46 UTC (Thu) by mpr22 (subscriber, #60784) [Link] (12 responses)

For large manuals, my experience is that info merely sucks less than a man page; the user interface of both /usr/bin/info and /usr/bin/emacs -f info is horrible. For simple things, man wins by a country mile, because it doesn't slice-and-dice a simple program's documentation into 742 one-paragraph pages.

info considered harmful?

Posted Nov 11, 2010 22:47 UTC (Thu) by vonbrand (guest, #4458) [Link] (1 responses)

Try pinfo

info considered harmful?

Posted Nov 12, 2010 14:08 UTC (Fri) by jzbiciak (guest, #5246) [Link]

Another vote for pinfo. It doesn't hate me for wanting to know something like "info" does.

Glibc change exposing bugs

Posted Nov 11, 2010 22:56 UTC (Thu) by HelloWorld (guest, #56129) [Link] (8 responses)

Dividing manuals into chunks of a sensible size is a feature, not a bug. And if you don't like GNU info or emacs, just use something else. You can view info manuals with konqueror by typing info:<program name> into the address bar, and yelp is also capable of displaying info documents.

Glibc change exposing bugs

Posted Nov 12, 2010 18:19 UTC (Fri) by sorpigal (guest, #36106) [Link] (1 responses)

I don't know about you but for anything less than ten pages I find man much easier than info for one very simple reason: It's easy to scroll through a stream of text. It's also easier to hit / and search the whole document, it's easy to not get lost, etc.. Info's problem is that info readers don't default to a man-like one-big-document, which is well known, well accepted and suitable to a terminal (which is, I imagine, where most man and info pages are consumed).

I've used pinfo and it helps some in the UI department, but I'd still use man over pinfo for almost every trivial lookup. If your goal is to completely replace man then your system needs to be a drop-in replacement from a user interaction point of view, with the advantages discoverable by users who are interested in learning them.

Glibc change exposing bugs

Posted Nov 12, 2010 18:45 UTC (Fri) by foom (subscriber, #14868) [Link]

> easier to hit / and search the whole document

Not really: "info" also searches the whole document if you hit /. (although I share the general dislike for the info browser).

Glibc change exposing bugs

Posted Nov 25, 2010 15:22 UTC (Thu) by Spudd86 (guest, #51683) [Link] (5 responses)

info's major problem is that it's interface SUCKS, there's no real 'back' command, the keybindings are just plain weird (unless you're an EMACS user...).

It'd be nice to have an info viewer that converts to HTML on the fly and uses webkit to render it.

Glibc change exposing bugs

Posted Nov 25, 2010 22:13 UTC (Thu) by paulj (subscriber, #341) [Link] (4 responses)

Have you tried going to System -> Help? GNOME's "Yelp" supports browsing info docs - providing a web browser style GUI...

Glibc change exposing bugs

Posted Nov 26, 2010 0:31 UTC (Fri) by Spudd86 (guest, #51683) [Link] (3 responses)

Don't use GNOME, I wonder how much of GNOME Yelp pulls in

Glibc change exposing bugs

Posted Nov 26, 2010 0:40 UTC (Fri) by sfeam (subscriber, #2841) [Link] (2 responses)

You could use konqueror instead
konqueror info:tar

Glibc change exposing bugs

Posted Nov 26, 2010 1:33 UTC (Fri) by Spudd86 (guest, #51683) [Link] (1 responses)

Don't use KDE either... I use XFCE and try to keep most of the GNOME stuff not installed.

Glibc change exposing bugs

Posted Nov 27, 2010 13:26 UTC (Sat) by paulj (subscriber, #341) [Link]

Well, if you want a web interface style GUI for info, but don't want to install either of the main two GUI environments, then... ;) Pinfo possibly is closest to what you want. A lynx/elinks style browser interface, for the terminal.

Glibc change exposing bugs

Posted Nov 12, 2010 10:36 UTC (Fri) by marcH (subscriber, #57642) [Link]

You are mixing in the same very short post three entirely unrelated things:
- the info format
- the info reader
- how fine the writer sliced the document
Very confusing.

Glibc change exposing bugs

Posted Nov 12, 2010 5:01 UTC (Fri) by nicooo (guest, #69134) [Link] (5 responses)

The rest of the world uses HTML and PDF for that kind of documentation.

Glibc change exposing bugs

Posted Nov 12, 2010 7:33 UTC (Fri) by paulj (subscriber, #341) [Link]

Funnily enough, a lot of PDFs are written in some other language and generated through TeX (that I read anyway), with PDF being just one possible output format. Which is just how GNU _Tex_info works too..

Glibc change exposing bugs

Posted Nov 12, 2010 10:42 UTC (Fri) by marcH (subscriber, #57642) [Link]

HTML does not support indexes, a very useful feature of the info document format. I find most PDF viewers cumbersome for screen browsing; not every surprising since it is a *printer* format at the core.

I find it too bad that a not-so-good default user interface is rebuffing users before then even start to see the nice features of the format. The fix is to promote alternatives user interfaces, something I keep doing constantly (and which has already been done here).

Glibc change exposing bugs

Posted Nov 12, 2010 13:59 UTC (Fri) by HelloWorld (guest, #56129) [Link] (1 responses)

What's your point? You can generate both PDF and HTML from info.

Glibc change exposing bugs

Posted Nov 12, 2010 20:10 UTC (Fri) by nicooo (guest, #69134) [Link]

That's texinfo. Using info for online documentation is what everyone hates.

Glibc change exposing bugs

Posted Nov 12, 2010 23:32 UTC (Fri) by Wol (subscriber, #4433) [Link]

And pdf is (done properly) one big page, just like man :-)

Which is why I like man, and like pdf, and just curse profusely every time I'm exhorted to use info!

Cheers,
Wol

Glibc change exposing bugs

Posted Nov 12, 2010 13:52 UTC (Fri) by Wol (subscriber, #4433) [Link] (5 responses)

I'd actually say the complete opposite! Even for a complex chunk of documentation, I'd rather have man than info.

At least with man, I can scroll down (or search) until I find what I'm looking for.

info, on the other hand, "you are in maze of twisty little passages all alike". When presented with the instruction to "use info", I give up and use the web. When presented with a 1000-line man page, no problem ... :-)

Cheers,
Wol

Glibc change exposing bugs

Posted Nov 12, 2010 14:06 UTC (Fri) by HelloWorld (guest, #56129) [Link] (4 responses)

> I'd actually say the complete opposite! Even for a complex chunk of documentation, I'd rather have man than info.

> At least with man, I can scroll down (or search) until I find what I'm looking for.
So you can with info. You can search the complete manual with the s key. The fact that you don't know this indicates you don't bother to read documentation at all really.
> info, on the other hand, "you are in maze of twisty little passages all alike".
If you had actually read the headings of the "twistly little passages", you would have found that they're really not alike at all. Alas, you don't seem to have bothered and decided to pointlessly whine about info instead.

Glibc change exposing bugs

Posted Nov 12, 2010 19:11 UTC (Fri) by bronson (subscriber, #4806) [Link]

Wonder if self-important replies like this have contributed to info's utter obscurity...?

Take a deep breath dude. Different people like different things.

Glibc change exposing bugs

Posted Nov 12, 2010 23:39 UTC (Fri) by Wol (subscriber, #4433) [Link] (2 responses)

Ah. "s" for "search".

The problem with that is if I can't articulate what I'm searching for. The number of times I've searched on what I think is the obvious search key, wasted half-an-hour or so doing it, then done a manual scroll through whatever I can find.

I then find what I'm looking for, and discover that it's called something (to me) extremely obscure, and doesn't mention my search term at all, etc etc.

Plus the fact that I'm one of those strange people who actually DOES tend to read documentation, from cover to cover, and likes to have a straight line path through it, not with redirects and jumps and god knows what all over the place. About the only place I can find information on info is in info - and if I find info repellent, how on earth am I going to find out how to use it if I have to use it to find out?

THERE is your problem with info - if you hate it because you can't find out how to use it, it's catch 22. You need to know how to use it to find out how to use it :-)

Cheers,
Wol

Glibc change exposing bugs

Posted Nov 13, 2010 0:42 UTC (Sat) by foom (subscriber, #14868) [Link] (1 responses)

Oh come on, if you can't stand to use "info info" long enough to figure out that you can use "space" and "backspace" to scroll forward and backward through the document (including going to the next page automatically upon reaching the end of the current one), then I dunno what to say.

Glibc change exposing bugs

Posted Nov 14, 2010 22:32 UTC (Sun) by nix (subscriber, #2304) [Link]

Well, info's handling of backspace in particular has long been buggy: it has a habit of going up to the top of the current page only, and then halting. Space has always worked, though.

Glibc change exposing bugs

Posted Nov 11, 2010 7:31 UTC (Thu) by nix (subscriber, #2304) [Link] (8 responses)

And POSIX is on the web and in the 3p manpages, so developers *still* have no excuse. (They should be developing to the POSIX manpages anyway, not the Linux ones.)

Glibc change exposing bugs

Posted Nov 12, 2010 7:30 UTC (Fri) by hozelda (guest, #19341) [Link] (7 responses)

I think the man pages that say "should" may want to clarify that issue a little better; however, it does appear to have the correct information.

If you use Linux, the Linux documentation should be authoritative. Hopefully, it will agree with POSIX and C99 (or whatever is the latest memcpy standard) as much as possible. If there is a reason for a change (or to document a Linux bug) and you use Linux, I would pay attention to the Linux documentation and treat everything else as advisory. If you use Red Hat or whatever other distro, I would look treat those docs as authoritative and not whatever other standard you think should apply.

A different matter is arguing about keeping Linux in sync with POSIX, etc, but if you want to build software that will work, short of maintaining your personal set of patches not accepted by upstream, you would probably want to code to "Linux" (at least for the Linux port).

Glibc change exposing bugs

Posted Nov 12, 2010 7:37 UTC (Fri) by hozelda (guest, #19341) [Link]

Before you say that a man page is not authoritative, I don't know the answer to that but it depends on your Linux vendor. In practice you will want to follow the major standards and consider otherwise to probably be an error in the man page; however, if you vendor says that X and Y are the documents, then that is what you go by (perhaps bringing up doubtful points to your vendor's attention). In particular, if you don't like a vendor that hacks Linux to bypass certain standards, then change vendors or ask for help in identifying these hacks.

Glibc change exposing bugs

Posted Nov 14, 2010 21:19 UTC (Sun) by nix (subscriber, #2304) [Link] (5 responses)

No, you normally want to code to POSIX. Carefully-written software does not *require* much if any porting to work on Linux rather than Solaris or IRIX or even sometimes AIX. If it's POSIX, it should work.

(You might need to adjust for bits of older systems that are non-POSIX, but that is really quite rare these days unless you're aiming for some strange emulation layer like Cygwin. Also you might need to do byteorder detection and so forth, but, again, that's stuff which is left unspecified by POSIX. You should not generally have to use Linux-specific stuff unless you really want to, and you normally shouldn't want to.)

Glibc change exposing bugs

Posted Nov 14, 2010 22:03 UTC (Sun) by promotion-account (guest, #70778) [Link] (3 responses)

You should not generally have to use Linux-specific stuff unless you really want to, and you normally shouldn't want to.

I'm sure you know this, but for some applications, POSIX is not really enough. Thus, for example, the need for some portable abstraction libraries like libevent.

Glibc change exposing bugs

Posted Nov 14, 2010 23:08 UTC (Sun) by nix (subscriber, #2304) [Link] (2 responses)

Yes, exactly. But at worst you should stuff the nonportability into a library with an API which can be replicated on other platforms (or make that library as portable as possible, and keep it a separate library to keep the ugly away from everyone else.)

(btw, your account name is... *interesting*.)

Glibc change exposing bugs

Posted Nov 15, 2010 1:37 UTC (Mon) by promotion-account (guest, #70778) [Link] (1 responses)

(btw, your account name is... *interesting*.)

That's descriptive anonymity :)

Readers usually give higher weight to subscribers opinions here, so this handle honestly states that I'm a promoted guest.

Glibc change exposing bugs

Posted Nov 15, 2010 10:39 UTC (Mon) by nix (subscriber, #2304) [Link]

Ah. I interpreted it as 'account bought to promote something else', and got confused because most advertisers would try to lie about it and *not* mention their affiliations :)

'Promotion' is a word with many meanings...

Glibc change exposing bugs

Posted Nov 15, 2010 8:13 UTC (Mon) by dlang (guest, #313) [Link]

you are assuming that the program authors care about Irix, AIX, Solaris, or anything else.

most programs do not start off being written portably, usually portability is something that shows up after the program starts being used when people ask about using it on other platforms (and it's not uncommon for it to wait until those people asking submit patches)

not saying that this is right, just saying that it's the way things are. When Solaris dominated the same thing happened favoring it.

Glibc change exposing bugs

Posted Nov 11, 2010 0:30 UTC (Thu) by charlieb (guest, #23340) [Link] (2 responses)

> Hmm... the man page on my desktop computer (RHEL4) says:

What manpage is that? The memcpy(3) manpage on my CentOS4 box does not say "the behavior is undefined". Ah, I see that the memcpy(3p) one does.

> 'must' and 'should' are more in the vein of RFCs.

OK. But at least those are clear. "should" in the context of an API man page is not.

Glibc change exposing bugs

Posted Nov 11, 2010 12:23 UTC (Thu) by gidoca (subscriber, #62438) [Link]

> OK. But at least those are clear. "should" in the context of an API man page is not.
If "should" is interpreted the way you do, then they might as well have omitted the sentence.

Glibc change exposing bugs

Posted Nov 11, 2010 18:03 UTC (Thu) by donwaugaman (subscriber, #4214) [Link]

memcpy(3) on the same RHEL4 box says:

"The memory areas may not overlap."

... which sounds a little stronger than "should" to me.

Not sure why CentOS4 differs...

At any rate, arguing over the man pages is irrelevant to the standard - if the man pages don't match the standard, the man pages need to be fixed rather than the standard.

That being said, it would sure be nice to have some kind of formal deprecation of the previous behavior. One of the nice things about the free software world is that it should be more possible to make these kinds of changes because it's easier to change the programs whose assumptions worked OK with the previous behavior but are violated by the new behavior. Of course, with closed-source Flash players, that goes out the window, and it becomes a question of whether it is more important to pacify Adobe users or to give Adobe an incentive to clean up its software.

Glibc change exposing bugs

Posted Nov 11, 2010 0:04 UTC (Thu) by nix (subscriber, #2304) [Link]

POSIX states:

> If copying takes place between objects that overlap, the behavior is undefined.

The behaviour of Linux (and Unix) systems in this area are governed by POSIX, not a random manpage. (And in this case POSIX is aligned with ISO C, and even uses the same phrasing.)

Glibc change exposing bugs

Posted Nov 10, 2010 21:33 UTC (Wed) by stijn (guest, #570) [Link] (5 responses)

A salien point made by Linus was that there has never been a warning, and thus there has been no test coverage. In a communal view of software production and use it seems a bit unthoughtful to push this through and let (less technical) users suffer. It makes the software and the makers look bad. It makes it worse if that is shrugged off in a disdainful manner.

Glibc change exposing bugs

Posted Nov 10, 2010 21:47 UTC (Wed) by jedbrown (subscriber, #49919) [Link] (4 responses)

Valgrind produces exactly such a warning.

Glibc change exposing bugs

Posted Nov 10, 2010 22:09 UTC (Wed) by stijn (guest, #570) [Link]

You are right. My point would have been phrased better like this:

In a communal view of software production and use it seems a bit unthoughtful to push this through and let (less technical) users suffer. It makes the software and the makers look bad. It makes it worse if that is shrugged off in a disdainful manner.

Glibc change exposing bugs

Posted Nov 11, 2010 18:47 UTC (Thu) by oak (guest, #2786) [Link] (2 responses)

> Valgrind produces exactly such a warning.

And it has been doing it for nearly a decade. And of course many other free memory debugging facilities like Duma (improved version of Electric Fence), mpatrol etc. produce these warnings too. As I would assume proprietary ones (on other platforms) to do also...

One could also define _FORTIFY_SOURCE to turn memcpy() etc into checking, slower versions. For more info, see:
* http://wiki.debian.org/Hardening
* https://wiki.ubuntu.com/CompilerFlags

Glibc change exposing bugs

Posted Nov 17, 2010 14:45 UTC (Wed) by meuh (guest, #22042) [Link] (1 responses)

As I said here https://bugzilla.redhat.com/show_bug.cgi?id=638477#c116

Using -D_FORTIFY_SOURCE enable only check for overflow when source and destination length are known (or can be computed).

_chk() variant of memset(), memcpy(), etc. didn't check for overlap.

And one should know that GCC provides inline versions of such functions, so valgrind won't be able to overload them and provide stronger argument checking.

Glibc change exposing bugs

Posted Nov 17, 2010 19:08 UTC (Wed) by oak (guest, #2786) [Link]

> And one should know that GCC provides inline versions of such functions

Wasn't this article about Glibc memcpy(), not the GCC (libgcc?) one?

Anyway, AFAIK GCC does that only if code is compiled with optimizations. Valgrind and -O0 compiled code are speed-wise pretty horrible combination though. Then it might be better to use one of the other memory debugging tools that don't do CPU emulation like Valgrind does...

Note that GCC doesn't inline its memcpy() code just for explicit (fixed size) memcpy() calls. Inlined version may also be used for assignments and developers are able to mess up addresses of variables used in thing like this too:

  struct foobar_t *a = arg1, *b = arg2;
  ...
  *a = *b;

(I found this issue on implicit GCC memcpy() when my code didn't have correct alignment for one of above kind of pointers on platform that required things to be properly aligned. It triggering a kernel alignment exception handler bug had me scratching my head until more knowledgeable colleague came to rescue... I think with overlapping pointer addresses results may be even more mysterious as they show up later.)

Glibc change exposing bugs

Posted Nov 10, 2010 21:51 UTC (Wed) by nix (subscriber, #2304) [Link]

Also, everyone has known not to do this since C was young, long long before Linux ever existed.

My reaction here is the same as HJ's: if a function as speed-critical as memcpy() is made faster by a change, and it breaks overlapping memcpy()s, the fault is the overlapper for being bloody stupid. (However, if Linus is right that this isn't actually speeding anything up except perhaps in microbenchmarks, then the large size of the 'fast' memcpy() implementation becomes an issue.)

Glibc change exposing bugs

Posted Nov 10, 2010 19:41 UTC (Wed) by mrshiny (subscriber, #4266) [Link] (16 responses)

I had to do something like this years ago when glibc changed and quake2 stopped working. Again, it was sound-related, and again, it was memcpy optimizations that broke things. I made a simple preload for memcpy that did the dumbest possible memcpy I could think of, which worked well enough for quake2. I'm a little disappointed that history is repeating itself again.

Glibc change exposing bugs

Posted Nov 10, 2010 19:49 UTC (Wed) by jwb (guest, #15467) [Link] (15 responses)

I'm glad to see glibc willing to commit changes like these. A modern CPU is absurdly fast and I want my software to run as quickly as possible. I specifically DO NOT WANT progress on glibc performance to be sacrificed to the implementation flaws of proprietary crapware ported from other platforms.

Glibc change exposing bugs

Posted Nov 10, 2010 20:01 UTC (Wed) by mrshiny (subscriber, #4266) [Link] (3 responses)

Please.

First of all, glibc supports proprietary software, which is why they allow proprietary software to link to it. So punishing certain programs for license choice by making subtle (and sometimes unjustifiable) changes to API seems like a highly passive-aggressive approach to their ideology.

Second of all, there MUST be a way to preserve backwards compatibility AND allow for future progress. Remember: all of your open-source programs which exhibit this bug are just as broken as Flash, except that if someone tracks it down in the Free software it can be fixed for future versions. But that doesn't help anyone who already has the software installed. I just can't imagine that the glibc people couldn't come up with ANY approach that works for everyone. Off the top of my head I can think of several; probably there are reasons why they are problematic, but in that bugzilla entry even Linus Torvalds was unable to prove that the glibc approach was warranted at all, let alone warranted for everyone all the time.

Glibc change exposing bugs

Posted Nov 10, 2010 22:49 UTC (Wed) by cesarb (subscriber, #6266) [Link] (1 responses)

> So punishing certain programs for license choice by making subtle (and sometimes unjustifiable) changes to API seems like a highly passive-aggressive approach to their ideology.

This was not an API change. The memcpy() API has always been that the regions cannot overlap, and has been so for decades. This was just a change in the implementation details.

Glibc change exposing bugs

Posted Nov 11, 2010 0:33 UTC (Thu) by marcH (subscriber, #57642) [Link]

> This was not an API change.

Agreed, let's not confuse API change with "change of undefined behaviour". It may look the same but it is different.

That's what symbol versioning is all about.

Posted Nov 10, 2010 23:03 UTC (Wed) by khim (subscriber, #9252) [Link]

GLibC does have such mechanism: it's called ELF symbol versioning. But the policy does not cover cases similar to discussed one: if it's bug in a program (and the fact that regions must not intersect is well-documented one... heck, it's reason for memmove(3) existence), then there will be no new version of function.

The question of "do we actually want such change" is separate issue.

Glibc change exposing bugs

Posted Nov 10, 2010 20:05 UTC (Wed) by dlang (guest, #313) [Link] (10 responses)

did you read the comments and see that Linus published benchmarks that showed that this change was not faster if the data was cached, and in the noise if the data is not cached (and even then the non glib version won 6 of 10 tests, but the results were very noisy)

Glibc change exposing bugs

Posted Nov 10, 2010 20:08 UTC (Wed) by jwb (guest, #15467) [Link] (9 responses)

Sure, but Linus doesn't even say what CPU he tested on. The commit message for the change to memcpy claims a 4x improvement on Core 2.

Glibc change exposing bugs

Posted Nov 10, 2010 21:03 UTC (Wed) by fuhchee (guest, #40059) [Link]

It would be nice if such patches were accompanied not but just statements of seen performance changes, but actual reproduction scripts, so one can test for oneself.

Glibc change exposing bugs

Posted Nov 10, 2010 23:05 UTC (Wed) by patrick_g (subscriber, #44470) [Link] (7 responses)

>>> Linus doesn't even say what CPU he tested on

Linus use a Core i5 at home (see this post).

Glibc change exposing bugs

Posted Nov 11, 2010 0:26 UTC (Thu) by jwb (guest, #15467) [Link] (5 responses)

Then it makes sense that he didn't see a speedup because the patch claims 1x speeds on Core i7. The improvement is seen on Atom and Core2.

Glibc change exposing bugs

Posted Nov 11, 2010 5:01 UTC (Thu) by nicooo (guest, #69134) [Link] (3 responses)

This change is starting to sound more and more pointless.

Glibc change exposing bugs

Posted Nov 11, 2010 6:54 UTC (Thu) by madscientist (subscriber, #16861) [Link] (2 responses)

Well, I have a core2 and I'd LOVE to get 4x speed improvement on memcpy(), which is probably the single most widely used function in all of C programming.

So it doesn't sound useless to me.

Glibc change exposing bugs

Posted Nov 11, 2010 9:42 UTC (Thu) by dgm (subscriber, #49227) [Link]

Also, any improvement to Atom performance can only be welcome.

Glibc change exposing bugs

Posted Nov 11, 2010 12:00 UTC (Thu) by alankila (guest, #47141) [Link]

I took a simple oprofile trace of me using my desktop for a while. Sad to say, but in that trace glibc's memcpy() used 0.24 % of time.

It would probably be better idea for majority of systems to just remove memcpy() and just replace it with memmove() which showed up with 0.17 %. Together, that would add up to 0.5 % at most, I suppose.

Glibc change exposing bugs

Posted Nov 11, 2010 16:26 UTC (Thu) by jedbrown (subscriber, #49919) [Link]

Here's my benchmark on Core 2. The Core 2 implementation in glibc-2.12.1 is *forward* (not reverse like on Nehalem, I don't know which way it is on Atom), and the performance difference between glibc and Linus' implementation goes both ways.

http://www.reddit.com/r/programming/comments/e4bq0/glibc_...

Glibc change exposing bugs

Posted Nov 11, 2010 0:57 UTC (Thu) by MattPerry (guest, #46341) [Link]

> Linus use a Core i5 at home (see this post).

But we don't know if he ran his test on that computer. There's not enough information to tie the two together. It would help if Linus stated what CPU he used to run the test on.

Glibc change exposing bugs

Posted Nov 10, 2010 19:51 UTC (Wed) by mattdm (subscriber, #18) [Link] (40 responses)

It seems like the right thing for distributions to do (whether or not glibc upstream finds it a good idea) would be to use a version of memcpy that crashes when overlapping memcpy is detected. After several years of that, the new version could be used more safely.

Glibc change exposing bugs

Posted Nov 10, 2010 20:05 UTC (Wed) by gus3 (guest, #61103) [Link] (2 responses)

Or, one that dumps core and/or logs to /var/log/debug (or whatever is configured to handle debugging messages). The more noise it makes, the more likely it is to get fixed, and quickly, once distributors start getting lots of automated backtraces in their bug trackers.

I learned the difference between memcpy() and memmove() in my very first C programming class. Adobe should be embarrassed that their programmers can't read documentation.

Glibc change exposing bugs

Posted Nov 11, 2010 12:38 UTC (Thu) by nye (guest, #51576) [Link] (1 responses)

Even better would be if we could design a system that *shoots the user in the head* every time a programmer writes something that isn't mathematically proven to be flawless in design and implementation.

That would be open source utopia I think.

Glibc change exposing bugs

Posted Nov 11, 2010 14:46 UTC (Thu) by nix (subscriber, #2304) [Link]

I would support such a system as long as large-calibre weapons were used and the option was available to remotely shoot the developer instead.

;}

Glibc change exposing bugs

Posted Nov 10, 2010 20:13 UTC (Wed) by jwb (guest, #15467) [Link] (36 responses)

It could also trap the overlapping memcpy, log, and switch to the unoptimized memcpy.

It could also trap the overlapping memcpy and switch to one that has a lot of sleep() calls in the inner loop. That might alert the ignorant programmers of ghastly Adobe software to their API abuse.

Glibc change exposing bugs

Posted Nov 10, 2010 21:17 UTC (Wed) by lmb (subscriber, #39048) [Link] (34 responses)

memcpy detecting that the areas overlap? What's next, are you going to propose making POSIX threads (and syscall behaviour w/the same) programmer friendly? ;-)

Glibc change exposing bugs

Posted Nov 10, 2010 21:52 UTC (Wed) by nybble41 (subscriber, #55106) [Link] (33 responses)

The memcpy() routine is used in a wide variety of places, sometimes implicitly, to copy small amounts of data. For example, implicit memcpy() calls may be inserted by the compiler whenever an initialized array is allocated on the stack. A test+branch based on the order of the operands may take just as long as copying the data. Ergo, adding even small amounts of additional error-checking to memcpy() may have a significant impact on performance.

The restrictions on memcpy() are hardly unique; *most* APIs do not tolerate overlapping memory regions. The memmove() routine is an exception. If you want a nice "safe" way to copy some data between buffers which may or may not overlap, and don't care so much about performance, just use memmove() everywhere.

While forward compatibility is a good thing in general, it is unreasonable for API developers to feel bound to support obvious *misuse* of their APIs which directly contradicts explicit API documentation, which is exactly what is happening here. Given that any broken applications can be trivially patched with a simple LD_PRELOAD, I see no reason not to permit this change to the internal implementation of memcpy() in glibc.

Glibc change exposing bugs

Posted Nov 10, 2010 22:18 UTC (Wed) by lmb (subscriber, #39048) [Link] (32 responses)

To be honest, I disagree. Yes, the programmers are misusing memcpy(), and relied on unspecified behavior. Yes, they should correct their code.

But the code worked so far - when users upgrade their glibc, suddenly their programs break, or possibly corrupt the user's data. How's that good?

Yes, it pushes users to complain to the developer (if they still can, and their e-mail/internet bits aren't affected), but it leaves them with a bitter aftertaste for the platform/ecosystem that forces developers to fix bugs at their users's expense.

The code should start with emitting a warning to the logs (once per program run, otherwise it becomes a DoS). The compiler could start warning if it detects the possibility of this happening (or coverity/valgrind etc all can). Possibly taunt developers publicly if you spot those messages in your logs.

But breaking underneath an unsuspecting user? Horribly, horribly wrong.

Glibc change exposing bugs

Posted Nov 10, 2010 22:34 UTC (Wed) by jwb (guest, #15467) [Link] (29 responses)

Your philosophy seems to be that if Flash never fixes this bug, we can never have the faster glibc memcpy. Why should free software be blocked by secret developments at Adobe?

Glibc change exposing bugs

Posted Nov 10, 2010 23:59 UTC (Wed) by lmb (subscriber, #39048) [Link] (24 responses)

You're focusing on Flash. I am not. The same applies to all buggy projects.

Knowingly introducing a change with consequences that aren't just mere crashes, but data corruption, for end users - if you cannot see how that is wrong, I have no idea how to help explain it.

Yes, of course, the performance achievement is worth having. However, not at this cost. Not without first adding some audit logging. Not without giving developers time to fix that. It is an incompatible change in the ABI.

I can read the man page as well as you can. Sure, the applications are buggy; that does not give one the right to corrupt user data. Such changes need to be phased in carefully; not in a "I am more righteous than you" style. It is bad enough when it happens by accident; intentionally, it is malpractice.

Glibc change exposing bugs

Posted Nov 11, 2010 0:05 UTC (Thu) by bojan (subscriber, #14302) [Link] (2 responses)

What else uses glibc-2.12.90 out there apart form Fedora 14 and rawhide? Next to nothing. So, the change is being rolled out carefully. Only limited number of people will be exposed to it - the ones willing to run the latest Fedora.

Glibc change exposing bugs

Posted Nov 11, 2010 16:35 UTC (Thu) by jedbrown (subscriber, #49919) [Link] (1 responses)

I assume you mean glibc-2.12 or glibc-2.12.1. 2.12 has been in Arch Linux since May, 2.12.1 since August. This issue has not affected me because the new memcpy is still forward on Core 2.

Glibc change exposing bugs

Posted Nov 12, 2010 0:02 UTC (Fri) by bojan (subscriber, #14302) [Link]

No, I mean 2.12.90:

glibc-2.12.90-18.x86_64
glibc-2.12.90-18.i686

This is an unreleased version of glibc. Fedora does this from time to time - ship early cuts of new glibc (this will be 2.13 one day).

Glibc change exposing bugs

Posted Nov 11, 2010 0:31 UTC (Thu) by jwb (guest, #15467) [Link] (14 responses)

"First adding some audit logging".

You are proposing that it would be wise to have a test and branch at every entry into memcpy? That is madness.

Glibc change exposing bugs

Posted Nov 11, 2010 1:52 UTC (Thu) by gus3 (guest, #61103) [Link] (13 responses)

No, it is simple integer math:

if ((p1 + length <= p2) || (p2 + length <= p1)) {
crash_and_burn();
}

It is not a sophisticated test, and the more noise it makes about buggy parameters, the sooner the calling code will get fixed.

Glibc change exposing bugs

Posted Nov 11, 2010 2:05 UTC (Thu) by gus3 (guest, #61103) [Link] (10 responses)

I got that test backwards. It should be:

if ((p1 + length >= p2) || (p2 + length >= p1)) {
crash_and_burn();
}

But goofing the test, doesn't mean the test isn't simple.

Glibc change exposing bugs

Posted Nov 11, 2010 2:12 UTC (Thu) by gus3 (guest, #61103) [Link] (9 responses)

Aaaaand I've still goofed it. I'm not taking into account... well, everything.

I see the actual test in my head, but I can't code it right now due to fatigue. But even with all necessary calculations, being integer math, it'll take no more than a few tens of cycles. Even on a register-starved x86, putting a couple temporary variables on the stack will only pollute the cache, before over-writing the temps anyway. It shouldn't take more than a microsecond to check for overlap.

Glibc change exposing bugs

Posted Nov 11, 2010 7:24 UTC (Thu) by nix (subscriber, #2304) [Link] (6 responses)

A few tens of cycles! That's much longer than an in-cache memcpy() on a small input takes: and most of its inputs are small.

There is absolutely no chance that the glibc devs would ever accept this except in the -lc_g version of the library (which nobody ever uses).

Glibc change exposing bugs

Posted Nov 12, 2010 23:04 UTC (Fri) by gus3 (guest, #61103) [Link] (5 responses)

It shouldn't be difficult to put it in as debugging code, but this isn't normal debugging. The GNU people should own up to having violated the documentation on their code. One year of making noise, or maybe even just six months, should be long enough for developers to clean up their code from this erroneous assumption.

Glibc change exposing bugs

Posted Nov 14, 2010 22:29 UTC (Sun) by nix (subscriber, #2304) [Link] (4 responses)

The GNU people should own up to having violated the documentation on their code.
What on earth? The relevant documentation for memcpy() is ISO C, incorporated by reference into all versions of POSIX.1. This clearly states "If copying takes place between objects that overlap, the behavior is undefined."

This isn't an obscure or hard-to-interpret part of the Standard. Undefined, bang, that's it. Perhaps you are operating under the misapprehension that the linux manpages project, a descriptive effort, not a prescriptive one, is in some way binding on glibc? It isn't. It really isn't. It isn't binding on anything.

Glibc change exposing bugs

Posted Nov 15, 2010 0:00 UTC (Mon) by promotion-account (guest, #70778) [Link] (3 responses)

What on earth? The relevant documentation for memcpy() is ISO C, incorporated by reference into all versions of POSIX.1.

Indeed.

Linux man-pages are only authoritative for the kernel system-calls (more precisely, their glibc thin layer). The rest of the APIs are only included for convenience: they are a secondary source to the primary source references residing in the 'CONFORMING TO' section.

Glibc change exposing bugs

Posted Nov 15, 2010 0:31 UTC (Mon) by nix (subscriber, #2304) [Link] (2 responses)

Linux man-pages are only authoritative for the kernel system-calls (more precisely, their glibc thin layer).
No, even those are descriptive. Perhaps the glibc texinfo documentation would be authoritative for that, if it was ever maintained by anyone. As it is, I think only Ulrich and Roland's brains are authoritative for glibc.

Glibc change exposing bugs

Posted Nov 15, 2010 1:27 UTC (Mon) by promotion-account (guest, #70778) [Link] (1 responses)

I remember finding a good number of system-call manpages discussions in LKML. Otherwise, where should we find documentation for things like futexes, netlink sockets, and the rest?

For good or bad, these manpages are the 'most primary' sources available for such topics, only beside the code.

But unfortunately these man-pages do not always exist. I once had to carefully study the bluez userspace code to know how to best interface with the kernel Bluetooth API (undocumented AF_BLUETOOTH sockets, undocumented netlink interfaces, etc).

Glibc change exposing bugs

Posted Nov 15, 2010 10:38 UTC (Mon) by nix (subscriber, #2304) [Link]

Yes, they are the most useful documentation we have, especially for things that do not have glibc wrappers. But even for the kernel they are descriptive, and for things for which the glibc wrappers are the primary implementation (like readdir()) or for which there is no kernel component, the manpages are completely after-the-fact. (As far as I can tell the glibc project no longer bothers to document anything at all. There are lots of utterly undocumented things in glibc's allegedly public interface.)

Glibc change exposing bugs

Posted Nov 11, 2010 18:52 UTC (Thu) by oak (guest, #2786) [Link] (1 responses)

memmove() has this check you're clamoring for... And if the given areas don't overlap, it calls memcpy().

Glibc change exposing bugs

Posted Nov 15, 2010 0:14 UTC (Mon) by promotion-account (guest, #70778) [Link]

memmove() has this check you're clamoring for... And if the given areas don't overlap, it calls memcpy().

Sometimes even if the areas do overlap, it calls memcpy(). This happens if the library has an internal knowledge about memcpy()'s copying direction.

A common example is having src > dst, copying is forward, and the CPU block transfer unit is smaller than or equal to (src - dst). x86-64 CPUs support copying up-to 8-byte blocks in one opcode (movsq), assuming no floating-point ops in use, which is usually the case with kernel code.

Glibc change exposing bugs

Posted Nov 12, 2010 6:34 UTC (Fri) by jmm82 (guest, #59425) [Link] (1 responses)

That is two tests and one branch.

Glibc change exposing bugs

Posted Nov 12, 2010 22:56 UTC (Fri) by gus3 (guest, #61103) [Link]

Given that the total calculations will be less than one memory page, or one cache line, and the tests can be marked in code as "likely to fail" (that is, no crash and burn), Intel's post-486 processors will already be fetching the normal (everything's OK) code by the time the final test calculations are done. The branch prediction will fail on the crash-and-burn case.

Glibc change exposing bugs

Posted Nov 11, 2010 11:13 UTC (Thu) by marcH (subscriber, #57642) [Link] (5 responses)

> It is an incompatible change in the ABI.

No it is not, unless "defined-undefined behaviour" has now become part of Interfaces.

By using the wrong name you are trying to sidestep all the nuances of this problem. Unfair tactics lowering your credibility.

Glibc change exposing bugs

Posted Nov 11, 2010 12:52 UTC (Thu) by nye (guest, #51576) [Link] (4 responses)

>No it is not, unless "defined-undefined behaviour" has now become part of Interfaces.

Deterministic observed behaviour, like it or not, will always be considered a part of the ABI.

This is why the kernel goes out of its way to preserve observe but undocumented behaviour, and one of the reasons Windows is wildly successful despite its numerous design flaws is that Microsoft agrees.

If a change breaks existing software, then it's a regression. Hand-wringing, finger-pointing, and bitter recriminations about 'proprietary crapware' are all irrelevant. Something worked. Now it doesn't.

From the comments on this it sounds like symbol versioning could be used to avoid this problem altogether, while still getting the benefit for newly built applications. Developers don't want to do this because they feel that it will benefit only proprietary software[0]. Of course the only people harmed by this attitude are end users.

This is just yet another case where open source software chooses politics over technical excellence, which is sad but entirely unsurprising.

[0] Disregarding the idea that one might want to use some open source software with a similar bug that hasn't yet been fixed - most developers seem to always want to run the latest bleeding edge version of everything, and don't understand that the rest of the world isn't like that and expects existing software not to break unexpectedly.

Glibc change exposing bugs

Posted Nov 11, 2010 18:27 UTC (Thu) by donwaugaman (subscriber, #4214) [Link]

> This is just yet another case where open source software chooses politics over technical excellence, which is sad but entirely unsurprising.

Oddly enough, I would consider "technical excellence" to mean fixing bugs in software that has them, in this case the Adobe Flash player, whereas "politics" means allowing poorly-written precedent to trump (and in this case penalize) better performance for programs written with an eye to the standard.

It's a shame there's no way to get Adobe to do an 's/memcpy/memmove/' on their codebase. But the fact that they won't let others do it has more to do with their politics (and opposition to software freedom) than about technical excellence.

Glibc change exposing bugs

Posted Nov 11, 2010 18:32 UTC (Thu) by xilun (guest, #50638) [Link] (2 responses)

> Deterministic observed behaviour, like it or not, will always be considered a part of the ABI.

Nonsence. The definition of the ABI is NOT any random characteristic that would please you by making others responsible for your own errors.

First if you want to program in a language (and its associated standard library) that is not full of undefined behaviors, then you don't program in C.

If you do program in C, *YOU* are responsible for respecting preconditions. The system will *NOT* magically fix your bugs for you. Glibc developpers are not responsible for the Flash software package, and this is not a free software or not free software problem; they are also not responsible for other random piece of free software, even crappy ones.

You are inversing roles, given the bad one to the developpers of the very high quality, standard compliant, piece of code the is glibc, and the good one to the constant notable piece of crap that is the flash player.

And even if you could be a dictator for the glibc project, please explain us what is the politic you would then impose to *concretely* solve the generalized unattended interactions problem between software components. Nobody has ever solved that. Even Microsoft, which you seems to cherish so much, does for *years* (maybe even decades now) ask third party developers to ship the MS libc the third party developer test his application with. So of course most of the time you don't have this problem of a suddenly changing libc under MS Windows, because each program has its own libc. Now what happens in case some version contains a security exploitable flaw? Security effort are duplicated in such an environment. (And this is just an example.)

Imagine a random application depends on a BUG of a particular version of the glibc. Because of that, you are asking for this bug to remain forever? Nonsense. This is what you call "technical excellence"? What a joke. It would mean freezing libraries forever, because that's the only way you can guarantee in the shared library model that the behavior of any random piece of crappy software won't change too much.

If you're a third party developer who only cares about making your proprietary app sort of working even when you write highly faulty unacceptable quality code, please: 1/ ship it with frozen version of the library it needs, like you do under windows anyway 2/ leave the developers of libraries that are trying to improve them alone, and especially do not report YOUR mistake on them.

I'm not sure nye meant all regressions need to be avoided.

Posted Nov 12, 2010 2:50 UTC (Fri) by gmatht (subscriber, #58961) [Link] (1 responses)

> So of course most of the time you don't have this problem of a suddenly changing libc under MS Windows, because each program has its own libc. Now what happens in case some version contains a security exploitable flaw? Security effort are duplicated in such an environment. (And this is just an example.)

This seems almost a strawman. Criticize Microsoft's policy if you want, but all nye suggested was using symbol versioning for this particular known-to-be-dangerous change. This wouldn't cause any security issues (and may even avoid some).

I agree that it isn't possible to avoid every regression. For example, newer software often has performance regressions on old hardware. However, this seems like a particularly serious regression, so if there was an easy way to stop old versions of software silently corrupting data it may be worth taking.

I'm not sure nye meant all regressions need to be avoided.

Posted Nov 12, 2010 11:13 UTC (Fri) by xilun (guest, #50638) [Link]

Symbol versioning is when the API/ABI changes. Here, it has not.

Glibc change exposing bugs

Posted Nov 11, 2010 0:00 UTC (Thu) by nix (subscriber, #2304) [Link] (3 responses)

Note that glibc broke the JVM back in glibc 2.3.x days, when it made its internal symbols private and suddenly the JVM was discovered to have been relying on internal libc symbols. That made it not *start* at all, and at that point it was very closed-source, so nobody could fix it. Inevitably there were calls to revert that change and make all the glibc symbols public again. Sensibly, the glibc developers didn't do that: instead, the idiots using internal libc symbols fixed their bugs.

The same will happen here.

Glibc change exposing bugs

Posted Nov 11, 2010 0:08 UTC (Thu) by lmb (subscriber, #39048) [Link] (2 responses)

A start-up failure is acceptable. A straight crash might be OK, if it is unavoidable (even though then I'd already worry). Data corruption - compared to previous behavior - is not.

Of course it will cause the code to be fixed. But that the maintainers of the core system library place "I am right" above users's data is a worrying insight.

Glibc change exposing bugs

Posted Nov 11, 2010 0:26 UTC (Thu) by bojan (subscriber, #14302) [Link]

> Data corruption - compared to previous behavior - is not.

Now you're making glibc maintainers responsible for other people's bugs. They are not.

If this same buggy program was linked against some other library that implements memcpy() similarly to the way latest glibc does, the data would be just as corrupt.

In essence, it is the program that is corrupting the data, not glibc. And it's doing so by clear misuse of a function.

> But that the maintainers of the core system library place "I am right" above users's data is a worrying insight.

I think that's a bit overly dramatic. Fedora 14 is a fresh release, currently carrying a non-released version of glibc. As such, users of it (which includes me) sometimes encounter things that are surprising at first. But the audience is limited and the impact is not earth shattering.

Glibc change exposing bugs

Posted Nov 11, 2010 10:53 UTC (Thu) by nix (subscriber, #2304) [Link]

The glibc maintainers are much harsher than that. Any change which is implementation-defined in POSIX or ISO C's library functions is fair game for them to change, and they have very harsh words for people who complain (and explicitly don't care about breaking closed-source code that depends on such assumptions). Expecting them to insert slowing-down hacks for actively *undefined* stuff, given their somewhat cavalier attitude to implementation-defined stuff, is peculiar. (Their attitude to stuff that *is* defined is appropriately rigid: thou shalt not break it.)

Glibc change exposing bugs

Posted Nov 10, 2010 22:55 UTC (Wed) by clugstj (subscriber, #4020) [Link]

It wasn't "unspecified behavior", the manpage, in the first paragraph, says to not do this!

Glibc change exposing bugs

Posted Nov 10, 2010 23:56 UTC (Wed) by nix (subscriber, #2304) [Link]

The code happened to work on Linux. It certainly wouldn't have worked on a lot of other OSes, even Unixes. It's undefined behaviour: it might be broken at any time, without warning. And now it has. Compiler optimizations could perfectly well have broken it instead: perhaps we should compile everything -O0 to remove the chance of that.

Emitting a warning to the logs is far too expensive: this stuff is so often called that the compiler sometimes open-codes it! Adding a conditional in there that isn't absolutely needed would have horrible effects on performance.

(And as for detecting it at compile time, well, sure! It requires whole-program optimization of every single program and all its shared libraries, and even then detecting it reliably reduces to solving the halting problem. This seems to be rather harder than just valgrinding the bloody thing and learning elementary C before you write it.)

(Sure, there will be actual bugs teased out by this: code that didn't expect to receive overlapping regions when it was written, but that now is. But, guess what? I bet those overlapping copies were causing other bugs, because it is surely rare for code to just memcpy() from region A to *unexpectedly*-overlapping region B and then never do anything with A again.)

Glibc change exposing bugs

Posted Nov 12, 2010 14:05 UTC (Fri) by Wol (subscriber, #4433) [Link]

At an absolute minimum, a debug assert.

So that if any programmer is doing what they should, the system is going to fail under test.

Cheers,
Wol

Glibc change exposing bugs

Posted Nov 10, 2010 20:06 UTC (Wed) by HappyCamp (guest, #29230) [Link]

It maybe also be an issue with a bug in Glibc:

From:
https://bugzilla.redhat.com/show_bug.cgi?id=638477#c74

H.J. Lu 2010-11-10 15:00:40 EST
Comment 74

64bit Fedora 14 is pretty much broken on machines with
SSE4.2. I ran into random crashes with 64bit Fedora 14 on
Intel Core i7. It turns out that 64bit strncasecmp
is broken on machines with SSE 4.2:

https://bugzilla.redhat.com/show_bug.cgi?id=651638

Glibc change exposing bugs

Posted Nov 10, 2010 20:21 UTC (Wed) by MisterIO (guest, #36192) [Link] (10 responses)

By the way, do we really need a libc with so much hand-written assembly?

Glibc change exposing bugs

Posted Nov 10, 2010 20:24 UTC (Wed) by jwb (guest, #15467) [Link] (9 responses)

Yes. If you'd like to know why, recompile your system with cpu=generic and use it for a while.

Glibc change exposing bugs

Posted Nov 10, 2010 20:51 UTC (Wed) by MisterIO (guest, #36192) [Link] (8 responses)

(!hand-written assembly) != (cpu=generic)

But anyway, my argument was not just !assembly, it was also not so much assembly. Look at the one proposed by Linus:
void *memcpy(void *dst, const void *src, size_t size)
{
void *orig = dst;
asm volatile("rep ; movsq"
:"=D" (dst), "=S" (src)
:"0" (dst), "1" (src), "c" (size >> 3)
:"memory");
asm volatile("rep ; movsb"
:"=D" (dst), "=S" (src)
:"0" (dst), "1" (src), "c" (size & 7)
:"memory");
return orig;
}

It may not be all that well tested, but it's simple enough to be comprehensible.

Glibc change exposing bugs

Posted Nov 10, 2010 21:15 UTC (Wed) by jwb (guest, #15467) [Link] (5 responses)

I'm not sure what you're driving at. The problem of the current article is not that memcpy is written in asm, it's that the memcpy runs backwards. This has changed the semantics of the function and is unrelated to how it is implemented.

Glibc change exposing bugs

Posted Nov 10, 2010 21:20 UTC (Wed) by JoeBuck (subscriber, #2330) [Link] (3 responses)

No, the semantics of the function are that the behavior is not defined if the source and destination strings overlap, as the relevant standards and the man page clearly state. That's why there's an alternative function named memmove. If you write C, call memcpy, and the arguments overlap, you've written a non-portable program.

Glibc change exposing bugs

Posted Nov 10, 2010 21:55 UTC (Wed) by nix (subscriber, #2304) [Link] (2 responses)

Nah, it would be non-portable if it were implementation-defined whether memcpy() worked for overlapping regions. Since it is undefined, what you have written when you use a memcpy() on overlapping regions is technically (in the most pedantic mode imaginable) not C at all.

(I know you know this, this is really for others reading)

Glibc change exposing bugs

Posted Nov 11, 2010 1:19 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (1 responses)

I think it /is/ C, but it isn't clear what the C means. This code doesn't break any of the rules of the C language which would cause it not to parse. Otherwise we wouldn't have got into this mess because it wouldn't compile.

[ Sadly I don't trust the Adobe developers enough to imagine that a diagnostic from static analysis would have stopped them doing this. I think "Warning: abuse of memcpy()" would have scrolled by with hundreds of other warnings they ignore... ]

In the same way "Don't frabidulate the wugs, she's my uncle" is an English sentence, but it isn't clear what it means. You can parse it, and you can answer some questions about it, e.g. "Are you being asked to frabidulate the wugs?" but there are big unknowns.

Glibc change exposing bugs

Posted Nov 11, 2010 7:21 UTC (Thu) by nix (subscriber, #2304) [Link]

This code doesn't break any of the rules of the C language which would cause it not to parse.
It is true that the standard does not require a diagnostic in this case, and that providing a diagnostic in all cases at compile time is impossible, but that doesn't make it any less 'not C'. C is not just 'what the compiler happens to accept'.

(I completely agree that in this particular case a warning would likely have been useless unless accompanied by a brickbat.)

btw, HJ's trying to fix the underlying problem here.

Glibc change exposing bugs

Posted Nov 10, 2010 21:21 UTC (Wed) by MisterIO (guest, #36192) [Link]

I'm not sure what _you_ are driving at! Read again the first message of mine that you commented on and you should get the general tone of my comment.

Glibc change exposing bugs

Posted Nov 10, 2010 22:04 UTC (Wed) by joib (subscriber, #8541) [Link]

And, this has a couple of other nice advantages

1) Less I$ pollution. You won't see this in a memcpy() benchmark, but what about a more realistic workload?

2) Give some incentive to CPU makers to optimize the simple rep mov instead of requiring ever more fancy unrolled loops written in the latest instruction set extension. :)

Glibc change exposing bugs - a bug in proposed memcpy

Posted Nov 16, 2010 16:45 UTC (Tue) by promotion-account (guest, #70778) [Link]

Look at the one proposed by Linus:
void *memcpy(void *dst, const void *src, size_t size)
{
      void *orig = dst;
      asm volatile("rep ; movsq"
          :"=D" (dst), "=S" (src)
          :"0" (dst), "1" (src), "c" (size >> 3)
          :"memory");
      asm volatile("rep ; movsb"
          :"=D" (dst), "=S" (src)
          :"0" (dst), "1" (src), "c" (size & 7)
          :"memory");
      return orig;
}

For completeness, this should have an "rcx" clobber, or GCC may believe that this important register will not change after each assembly snippet. Such a bug may get triggered if GCC aggressively inlined the code, which occurs in a good number of cases given its optimizer competency.

--Darwish

Glibc change exposing bugs

Posted Nov 10, 2010 20:45 UTC (Wed) by Rubberman (guest, #70320) [Link]

It has been published since the beginning of time that one does NOT use memcpy() when source and target overlap, but to use memmove() in such cases. Caveate Programmer! From the memcpy() man page:

[quote]
The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas
should not overlap. Use memmove(3) if the memory areas do overlap.
[/quote]

Nice timing!

Posted Nov 10, 2010 21:24 UTC (Wed) by proski (subscriber, #104) [Link]

I was recompiling pulseaudio in another window to debug that problem! That's why I like LWN so much!

Glibc change exposing bugs

Posted Nov 10, 2010 21:25 UTC (Wed) by ikm (guest, #493) [Link] (6 responses)

I was under the impression that GCC actually generates custom inline code for memcpy() calls. I wonder what the actual state of things is.

Glibc change exposing bugs

Posted Nov 10, 2010 21:47 UTC (Wed) by jwb (guest, #15467) [Link] (3 responses)

Are you sure Flash Player is compiled with GCC? Last I heard, Flash Player is built and released on a Gentoo box, so its development is clearly well out of the mainstream of the Linux toolchain.

Gentoo is fairly mainstream

Posted Nov 10, 2010 23:37 UTC (Wed) by alex (subscriber, #1355) [Link]

Gentoo's not that far out of the mainstream, in fact it's closer to "what source intended" from what most open source projects release. And it certainly uses gcc as it's base.

Glibc change exposing bugs

Posted Nov 10, 2010 23:56 UTC (Wed) by gerdesj (subscriber, #5446) [Link]

>Are you sure Flash Player is compiled with GCC? Last I heard, Flash Player is built and released on a Gentoo box, so its development is clearly well out of the mainstream of the Linux toolchain.

Cheap shot.

gcc version 4.4.5 (Gentoo 4.4.5 p1.0, pie-0.4.5)

But I do get the choice of something else if I want it - not what is rammed down my throat by the "mainstream".

I also get to support it ...

On the bright side, if your statement is true about release by Gentoo then I get a better chance of Flash working than you do - oh look no snags with Youtube audio.

Cheers
Jon

Glibc change exposing bugs

Posted Nov 14, 2010 5:33 UTC (Sun) by dirtyepic (guest, #30178) [Link]

Last time I checked Gentoo was built with GCC.

(Gentoo toolchain dev)

Glibc change exposing bugs

Posted Nov 10, 2010 21:54 UTC (Wed) by joib (subscriber, #8541) [Link] (1 responses)

IIRC only if the size is known at compile time, and below some limit.

Glibc change exposing bugs

Posted Nov 17, 2010 15:02 UTC (Wed) by meuh (guest, #22042) [Link]

But once inlined, those functions can't be overloaded with a LD_PRELOAD module nor with a tool like valgrind. And overlap will happen in user's back.

The real problem here

Posted Nov 10, 2010 23:38 UTC (Wed) by bojan (subscriber, #14302) [Link] (5 responses)

Is, of course, the fact that flash is not open source, so the bug cannot be easily fixed. If flash was an open source package in Fedora, the function use would be changed to memmove(), package would be rebuilt and issued as an update. And nobody would be talking about preserving the old memcpy() behaviour at all.

The real problem here

Posted Nov 11, 2010 12:58 UTC (Thu) by nye (guest, #51576) [Link] (3 responses)

>And nobody would be talking about preserving the old memcpy() behaviour at all.

Which would of course be a great shame, because avoidably breaking existing applications is wrong, regardless of whether that program has a hidden bug in it or not.

The real problem here

Posted Nov 11, 2010 18:41 UTC (Thu) by xilun (guest, #50638) [Link]

So why don't you just freeze all the programs you use forever and never upgrade them again on your computers? That would give you this magical property...

Two schools

Posted Nov 12, 2010 13:01 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (1 responses)

Your belief is sometimes called the "Raymond Chen" school of software engineering, although mostly because he's documented it rather than because he's in some way responsible for the Windows team taking this approach.

You can get yourself tied in some terrible knots this way. Chen's blog The Old New Thing is currently documenting how starting from [let's not annoy CP/M programmers] got them to [making an OS component optional causes security vulnerabilities in third party programs], over the course of a decade or so. Every step along the way is completely rational but the result is a confusing, insecure mess that's hard to reform.

But the alternative school, where everything not tied down and documented is up for grabs, and the tied down stuff might be cut loose and "deprecated" with relatively little notice, causes its fair share of problem as we've seen with the thread's topic.

Let me say this: It is very far from clear which of the alternatives here is better for anyone, from users to developers to OS vendors, let alone which would be best for all.

Two schools

Posted Nov 12, 2010 17:48 UTC (Fri) by jzbiciak (guest, #5246) [Link]

I can just imagine trying to convince the glibc folks to autodetect SimCity to dynamically change how free() works.

The real problem here

Posted Nov 19, 2010 2:14 UTC (Fri) by linuxrocks123 (subscriber, #34648) [Link]

Well, actually, in this particular case you could probably rewrite the binary to call memmove instead of memcpy fairly easily if you really wanted to.

---linuxrocks123

Glibc change exposing bugs

Posted Nov 11, 2010 0:32 UTC (Thu) by kunitz (subscriber, #3965) [Link] (5 responses)

Linus did a performance test of his simple memcpy() against the new glibc version. He couldn't find strong evidence that the new glibc version is actually faster. I would question optimizations that have no measurable benefits but the downside to break running code relying on undefined behaviour.

Glibc change exposing bugs

Posted Nov 11, 2010 0:44 UTC (Thu) by bojan (subscriber, #14302) [Link] (4 responses)

According to some comments above, he was testing on the CPU that didn't benefit form the optimisations (Core i5 instead of Core 2/Atom), so he couldn't have seen any improvements anyway.

Glibc change exposing bugs

Posted Nov 11, 2010 7:20 UTC (Thu) by kunitz (subscriber, #3965) [Link] (3 responses)

If that's the case the optimization shouldn't be enabled on Core i5. I guess it's more difficult to test for the CPU then for SSE 4.x presence.

Glibc change exposing bugs

Posted Nov 11, 2010 18:23 UTC (Thu) by oak (guest, #2786) [Link]

> If that's the case the optimization shouldn't be enabled on Core i5

What would be the point of slowing down memcpy for all CPUs (by adding an extra check for cached CPU type variable value)? As long as the change doesn't slow down things for other CPUs, and considerably speeds it up on some, it sounds fine...

Glibc change exposing bugs

Posted Nov 12, 2010 0:37 UTC (Fri) by jamesh (guest, #1159) [Link] (1 responses)

That would mean that a developer using a Core i5 will not be able to reproduce some bugs seen by users on Core 2 and Atom chips. Is that actually desirable?

Glibc change exposing bugs

Posted Nov 12, 2010 10:51 UTC (Fri) by marcH (subscriber, #57642) [Link]

No, but something like this happens every time developers rely on undefined behaviour, so we just have to live with it.

Glibc change exposing bugs

Posted Nov 11, 2010 2:06 UTC (Thu) by gmaxwell (guest, #30048) [Link]

I've run quite a bit of code through valgrind— and I don't believe I've ever actually seen the overlapping memcpy error. Even though it seems like an easy mistake it would seem that (almost) no one actually makes it.

That the error was found in flash, of all places, is not surprising. It's also scary that no one at adobe has been running flash in valgrind (but also not surprising).

Glibc change exposing bugs

Posted Nov 11, 2010 5:33 UTC (Thu) by PaulWay (subscriber, #45600) [Link] (5 responses)

Interesting that Linus didn't patch it by simply calling memmove with the same arguments and passing back the same return.

Interesting that no-one's suggested we fix the obviously ambiguous wording in the man page. It seems that trusting the C programmer to know the difference between memcpy and memmove - whose names do _not_ imply anything about their behaviour - is a bad thing. Rusty's Hierarchy of API Design scores another victim, and yet no-one wants to fix either the API, the documentation or the behaviour.

Interesting that the question of why backwards-copying is necessary remains (AFAICS) unanswered. Has anyone actually tested whether the loop can be written with a forwards-copy and whether it performs better or worse than the backwards-copy and/or the Linus brute-force method?

Interesting that everyone who doesn't want to change memcpy to do checks or warn or everything asserts, without much actual evidence, that it would be a Bad Thing. Citation needed, or at least some crude benchmarks or numbers.

In my uninformed opinion it would be better to have one version of memcpy (memmove implies that the memory is absent from the source once completed, which is not true) that does the checks. The tiny overhead will be nothing compared to the page faults you're almost certainly incurring with repeated use. Run some tests, real world or otherwise, to see whether it really makes any difference. If it doesn't, make memmove a defined alias for memcpy, update the documentation, and everyone wins. The API remains the same, badly written applications don't die because of an underlying implementation change, lazy programmers have their arses saved, everyone wins.

But what do I know? I'm still writing the test.

Have fun,

Paul

Glibc change exposing bugs

Posted Nov 11, 2010 9:54 UTC (Thu) by mpr22 (subscriber, #60784) [Link] (2 responses)

That particular kind of lazy programmer deserves to lose, unfortunately.

Glibc change exposing bugs

Posted Nov 12, 2010 0:17 UTC (Fri) by PaulWay (subscriber, #45600) [Link] (1 responses)

True. I for one think the best solution to this whole debacle is for Adobe to fix their code. But we can't do anything about that, and I don't hold any hope of Adobe putting this at any higher priority than 'do it after learning Etruscan'.

I personally think we should do more tests with this kind of thing. Use the same LD_PRELOAD trick that Linus used to fix the problem to see if any other applications are assuming that memcpy will work on overlapping regions. See if we can find any other little abuses of standards, or ambiguities in them, that might catch us out in the future. And not just fix the code, but fix the standard. No-one, I hope, is saying that people should be using memcpy as if it were overlap-safe - just that existing code which does is sort of exempt from criticism.

Ah well, maybe we'll all look back on this and laugh.

Have fun,

Paul

Glibc change exposing bugs

Posted Nov 12, 2010 7:41 UTC (Fri) by hozelda (guest, #19341) [Link]

"Fixing" the standard, if you mean going back to ISO C, might not be very likely to happen.

Glibc change exposing bugs

Posted Nov 11, 2010 12:15 UTC (Thu) by alankila (guest, #47141) [Link]

You make perfect sense.

But sadly, I don't think it is possible to make people here accept that simply aliasing memcpy() to memmove() is actually the best solution. I bet the difference wouldn't show in anything but carefully constructed microbenchmarks, and yet we would be able to squash a whole class of bugs at once.

However, I do believe that the best use of increased CPU power is to spend it on simplifying the system, because that allows raising the complexity bar somewhere else higher up. (I believe in complexity budget: a finite number of things are possible. You are best off spending that complexity budget on features close to the user than on those close to the metal, so making people not have to care about difference for memmove() vs. memcpy() allows them to spend time caring about something that's far more useful.)

Glibc change exposing bugs

Posted Nov 11, 2010 17:24 UTC (Thu) by RobSeace (subscriber, #4435) [Link]

> memmove implies that the memory is absent from the source once completed,
> which is not true

Well, actually, it IS kind of true... If the regions overlap (which is the sole
point of using memmove()), then it implies that the source region will indeed
no longer contain the data it previously contained, since at least part of it
would've been overwritten by the memmove() to the (overlapping) destination
region...

I side with the glibc people: any C programmer worth a damn knows better than
to use memcpy() on overlapping regions... Anyone that does so is writing known
buggy code that will fail to work on many systems... That it just happened to
have worked by chance until now on glibc doesn't matter a bit... There are
lots of subtle bugs you can make that appear to work fine until something
changes and exposes them... You see it in buffer overflows all the time; if
you overflow just a small amount and there's some meaningless variable there
in memory that you overflow into, no harm done, no crashing, no noticable bad
behavior at all... But, compile with a different optimization level, or change
the code in a certain way, and BAM!, that variable is no longer there to catch
the overflow, and you end up trashing something important... Now, are you
seriously going to say GCC should support such obviously buggy code by making
sure to always continue laying out variables in memory just as it did the
first time, so that the overflow causes no harm? If not, then how is this
glibc change any different at all?

Glibc change exposing bugs

Posted Nov 11, 2010 11:01 UTC (Thu) by slashdot (guest, #22014) [Link] (7 responses)

Are we sure that the check is expensive?

If the data is less than 128 bytes, it can be just all read into SSE2 registers and then written out, which handles overlap fine.

Otherwise, you can just check (size_t)(src - dst) >= (size_t)length, which shouldn't be that expensive compared to the copy.

But anyway, why is a backward copy supposed to be faster? It would seem pretty silly to design a CPU such that copies are better done backwards.

Perhaps just converting the new algorithm to a forward copy would give the same improvements?

Glibc change exposing bugs

Posted Nov 11, 2010 11:03 UTC (Thu) by slashdot (guest, #22014) [Link]

BTW, several (most?) distributions use eglibc instead of glibc, so there is hope that the more sensible eglibc maintainers will change the new copy code to work forwards.

Glibc change exposing bugs

Posted Nov 11, 2010 12:25 UTC (Thu) by NikLi (guest, #66938) [Link] (1 responses)

Having seen all sorts of crazy memcpies using MMX (glibc, DirectFB, kernel, etc, etc) and having tried to benchmark them, observing huge amounts of noise in the experinemtns, I have come to the conclusion that the best memcpy one can use is gcc's __builtin_memcpy.

There is also a big advantage by doing that: hopefully gcc in some cases can detect the alignment of pointers at compile-time and use even faster variants, which is even more important.

At least we hope that the gcc devs will remain sane (inclusion of "go" frontend is scary knowing that google tends to withdraw services and software without much thought (wave, etc))...

Glibc change exposing bugs

Posted Nov 11, 2010 14:49 UTC (Thu) by nix (subscriber, #2304) [Link]

The worst that will happen to go is that it suffers the fate of the CHILL frontend (nobody uses it, nobody remembers it exists, it's removed many years after introduction).

Go is not just 'google': Go (in GCC) is Ian Lance Taylor, who is a very-long-standing GCC hacker who doesn't have a record for abandonware (hell, he put out a new release of Taylor UUCP not too long ago, and how old is *that*?)

Glibc change exposing bugs

Posted Nov 12, 2010 14:32 UTC (Fri) by Wol (subscriber, #4433) [Link] (3 responses)

Please define a "backward copy".

Bear in mind Intel processors are arse-about-face (otherwise known as big-endian). Running on a little-endian processor, there is a clear "top" and "bottom". So we can define forwards and backwards.

But on Intel, let's say I want to write the number 1,234,567,890. And my processor has a 3-digit word size. It actually physically exists in the system as 890,567,234,1 ! So where's the top, bottom, front or back?

The other question, of course, is does the address register increment or decrement faster. There's no reason why those two operations should be equal cost (there's no reason why they shouldn't be, either :-) And if they're different, the result will be a difference in speed going forward or backwards.

Cheers,
Wol

Glibc change exposing bugs

Posted Nov 15, 2010 9:47 UTC (Mon) by mpr22 (subscriber, #60784) [Link] (2 responses)

I must confess to being utterly boggled by the notion of a backwards block copy (decrementing address) being faster than the forward (incrementing address) version. I mean, doesn't backward copying break the memory controller prefetch?

Glibc change exposing bugs

Posted Nov 15, 2010 11:50 UTC (Mon) by cladisch (✭ supporter ✭, #50193) [Link] (1 responses)

Backward copying avoids cache address aliasing effects on these processors:
http://lists.freedesktop.org/archives/pixman/2010-August/...

Glibc change exposing bugs

Posted Nov 15, 2010 12:15 UTC (Mon) by slashdot (guest, #22014) [Link]

That rationale seems a bit dubious.

In particular, won't just doing all reads before all writes ensure no aliasing regardless of CPU operation?
I think there are enough callee-clobbered registers on x86-64 to allow that.

That is, do this:
movq (%rsi), %rax
movq 8(%rsi), %rdx
movq %rax, (%rdi)
movq %rdx, 8(%rdi)

Also, their backward copy obviously aliases if rsi is 0xf00c instead of 0xf004. I'm not sure why either of these cases should be intrinsically more frequent.

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 15:24 UTC (Thu) by dwheeler (guest, #1216) [Link] (20 responses)

This isn't an either/or situation. The glibc folks have a great point that it's absurd to presume that a call preserve some functionality, when it has never guaranteed it and the various documentation available SPECIFICALLY says to not depend on it. But Torvalds also has a point that functionality not officially guaranteed, but depended on by real programs, shouldn't be lightly disregarded.

I think the solution for stuff like this is to phase in major changes, in a slower way. First, clearly document that "it used to work this way in practice, but soon it won't". Implement the new semantics in a "testing" library so that people can test it out before it goes "live", but don't ram it down production systems at first. Document *how* to run the testing situations clearly and obviously; libc_g and friends are essentially impossible to find, even if you know they exist. Then, after some time, switch. Yes, even all this somebody will be caught off guard, but the list of impacts will be a lot shorter (and thus more manageable). Also, if you've warned people, many people will be looking for that kind of problem, making it much easier to identify and fix the stragglers.

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 15:36 UTC (Thu) by jwb (guest, #15467) [Link] (1 responses)

I assure you that even if all those pointless hoops had been jumped through, Flash would still have been broken when the switch finally happened. You would simply have been punishing the users and free software developers for nothing.

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 19:40 UTC (Thu) by sgros (guest, #36440) [Link]

This is very interesting... people don't read specs, at least not carefully and tend to blindly generalize.

There are so many broken programs because someone tested something in a specific environment and it happened to work in that particular case and that test finishes with the broad conclusion it will always work.

Network is another example. I heard people, writing networking code that directly accesses Ethernet, claim that frames smaller than 46 octets are perfectly OK. Yes, they are, until some user starts using that code in different environment that is strict with respect to specs.

In the end, I'm not for helping bad programs and lazy programmers (lazy in negative, not positive sense!)

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 19:25 UTC (Thu) by xilun (guest, #50638) [Link] (13 responses)

In this case, the semantic has _not_ changed. You just can't pretend from the user side, that nasal deamons are doing predictable things... Semantic has never included observed behavior of random faulty programs, and never will. In C, when you write a faulty construct that has undefined behavior, there is no way to predict what really will happen on a particular system. Even when you know both the exact compiler version and the exact glibc version: any unrelated change in the same file could trigger other optimisation and could change how you program will fail, or even mask completely the failure.

Standard preconditions user have to observe _are_ parts of the semantic, and neither those nor the correct behavior of the glibc has changed when you do observe them, so in no way this particular memcpy optimization is a major change. Actual preconditions are often relaxed in a given implementation, but unless it's documented in an additional standard the way they are relaxed will never be the same between two implementations or two version of the same, so nobody can pretend to reliably take advantage of undocumented relaxed preconditions.

Would that particular memcpy change be considered as a major change, _every_ glibc change would need to be considered as a major change.

In other words, when a language do define from the beginning of time that trying some operations would result in undefined behaviors (and has since always be consistent about this definition), and when a system does not provide further guarantees, then it does not matter what the observed behavior is with version X of the compiler, Y of the libC, and processor Z with die revision T -- changing any of X, Y, Z or T, or even seemingly unrelated parts of the faulty program can result in it to violently explode, and will eventually result in that because of Murphy's law. It will still result in that even if you blame glibc developers for your own mistakes.

Every C programmer should know the distinction between implementation-defined behavior, undefined behavior, and unspecified behavior -- otherwise he should rather program in an other language... You'd also better have some notions about how compilers, sometimes in a way related to associated libraries, can take advantages of explicitly undefined and unspecified behaviors to do some optimization. Stopping to do that would be hugely ridiculous, on a level as ridiculous as stopping to simplify boolean equations by taking advantage of "don't care" outputs, or even stopping to automatically factorize redundant computations.

If you know the difference, but just don't like that C has undefined behaviors, or that C compilers and other associated system stacks are targeting efficient code sometimes by taking advantages of explicitly undefined behaviors, well it's not going to change anytime. So in this case you also don't have any choice: use another language.

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 22:34 UTC (Thu) by dafid_b (guest, #67424) [Link] (12 responses)

This thread is quite strange.

On one side there are arguments that a change that is made by free software purists that happens to break pre-existing programs is good - because FLASH is one of the broken programs...

On the other side are arguments that users systems are exposed to corruptions due to changes in the behaviour of a library call made for a marginal optimisation of a utility function.

To put it in perspective: I do not want the software I rely on to have ONE randomly inserted bug activated for a 200% improvement of its overall performance.

That bug could be the one a hacker uses to observe my credit-card details when paying for LWN subscription.

The proposed benefit is 20% of 2%, or 0.4%. The possible cost is my bank account.

I hope that the packagers of the distributions do the sensible thing.

That is: pull that change out and shoot it.

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 22:57 UTC (Thu) by dgm (subscriber, #49227) [Link] (6 responses)

It's quite simple: ask Adobe to fix the Flash player, or to open source it, so we can fix it ourselves.

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 23:09 UTC (Thu) by dafid_b (guest, #67424) [Link] (5 responses)

did you see this early comment in the thread...

"Actually I think we may have first seen this with squashfs. Problems showed up right before the F14 alpha. Phillip found the cause of the problem was using memcpy instead of memmove."

So there are at least two bugs exposed by this change in Glibc.

There may be more. There are vast number of applications out there still waiting to be tested.

It is just impolite to cause users to do the testing when you don't have to.

Dave

This isn't either/or. Phase in such changes instead!

Posted Nov 12, 2010 4:31 UTC (Fri) by mrshiny (subscriber, #4266) [Link] (4 responses)

Exactly. And given that Glibc has symbol versioning the performance can be had already for anyone who recompiles their code. Remember: this bug exists for all users of glibc even if they compiled their apps a long time ago but recently updated glibc. The fact that Flash is proprietary is slightly irrelevant to the discussion.

This isn't either/or. Phase in such changes instead!

Posted Nov 17, 2010 15:14 UTC (Wed) by meuh (guest, #22042) [Link] (3 responses)

Remember: this bug exists for all users of glibc even if they compiled their apps a long time ago but recently updated glibc.

It's not a bug. And it doesn't affect all users.

Hopefully, legitimate uses (regarding to specification) of memcpy() are not affected by the optimisation in newer glibc.

This isn't either/or. Phase in such changes instead!

Posted Nov 17, 2010 15:52 UTC (Wed) by mrshiny (subscriber, #4266) [Link] (2 responses)

Yes, it is a bug. Sure, the application is responsible for using APIs properly. But here we have a situation where a library has worked one way for years, and then suddenly works a different way. There was no way for the apps in question to detect the bugs because the code worked perfectly before. Now, due to a library upgrade, those apps don't work. In some cases there is data corruption. The corruption might happen silently. There is no way to be sure that this change is not quietly damaging untold amounts of data without auditing every use of memcpy everywhere to ensure that it is doing the right thing.

And this means that not only do you have to fix all source code which is wrong and issue new binaries, but you shouldn't upgrade to this version of Glibc because you might have an app somewhere that wasn't fixed, or isn't fixed in the version you have installed.

Glibc is a critical library in the system. Almost every program uses it. As such, it is their responsibility to treat ABI changes very carefully. Sure, this is not a change in the specification, it is an unintended consequence and it's due to those stupid lazy programmers who didn't read the spec or didn't care or whatever. Or inadvertently introduced errors when their code was changed. Or changed something without realizing that this change would result, somewhere, in a call to overlapping memcpy. Given that the bug was hard to identify (at least for some cases), and given that Glibc has symbol versioning, maybe they should use it?

Your last sentence sums up the problem: "Hopefully legitimate uses are not affected". I think we should expect stronger guarantees from glibc than "hopefully".

This isn't either/or. Phase in such changes instead!

Posted Nov 17, 2010 16:09 UTC (Wed) by meuh (guest, #22042) [Link]

Hopefully, every programmer run their programs under valgrind once in a while.

This isn't either/or. Phase in such changes instead!

Posted Nov 17, 2010 16:23 UTC (Wed) by xilun (guest, #50638) [Link]

> There is no way to be sure that this change is not quietly damaging untold amounts of data without auditing every use of memcpy everywhere to ensure that it is doing the right thing.

There is also no way to be sure that this change is not _fixing_ untold amounts of data corruption when the memcpy is done backward without auditing every use of memcpy :)
Anyway, C being what it is, this is a little ridiculous to do a fixation on that particular change, because some other changes exposing bugs are done every day, hundred at a time. So really, you have no way after ANY upgrade to be sure that memory corruption won't mysteriously happens when they previously did not. If that's a problem for you, don't ever update anything => problem magically solved.

> given that Glibc has symbol versioning, maybe they should use it?

Nope. Symbol versioning is for ABI changes, and symbol versioning does not even pretend to automatically solve every problem ABI changes has been shown to cause. The memcpy implementation change is not even an ABI change.

> Your last sentence sums up the problem: "Hopefully legitimate uses are not affected". I think we should expect stronger guarantees from glibc than "hopefully".

There was a problem only in the sentence. The "hopefully" is not needed. Legitimate users of memcpy will not be affected.

This isn't either/or. Phase in such changes instead!

Posted Nov 12, 2010 0:08 UTC (Fri) by xilun (guest, #50638) [Link] (4 responses)

> On one side there are arguments that a change that is made by free software purists that happens to break pre-existing programs is good - because FLASH is one of the broken programs...

I fail to see how my previous post, which you replied to, is in any way related to free software purist happy to break Flash.

Indeed I think you did not even read it.

So I'll make an executive summary (but with new elements, for those who follow): in http://www.coding-guidelines.com/cbook/cbook1_2.pdf ; read, starting at pdf page 183, 3.4 behavior, 3.4.1 implementation-defined behavior, 3.4.3 undefined behavior, and 3.4.4 unspecified behavior. You will then hopefully understand why it would indeed be *dangerous* for security (not even talking about performance) in the long term if a widely used implementation starts giving guarantees defining "undefined behaviors", or if the maintainers of such implementation start acting like there seems to be some guarantees. (Think about other compliant implementations.)

If you don't like implementation-defined, undefined and unspecified behaviors in programming languages, use Java. I'm indeed starting to wonder if Linus does not secretly dream about writing operating systems in Java -- look at: some of his responses during the NULL-page mapping debacle, GCC adding optimizations taking advantages of undefined behaviors on integers, and his position on this memcpy implementation.

> On the other side are arguments that users systems are exposed to corruptions due to changes in the behaviour of a library call made for a marginal optimisation of a utility function.

Users systems are exposed to corruptions because they wrote code having undefined behavior in the first place, and there should be neither surprise nor scandal when code containing faulty constructs having undefined behavior starts to behave in an undefined way, because that's precisely the definition of what "undefined behavior" means.

Undefined behavior could has well change observable behavior depending on your power supply, the phase of the moon, and the fact Linus has been personally annoyed by a random bug (the last cause being the most probable in those examples, which is a little weird from an economical perspective, but oh well). Blame glibc maintainers all that you want, but you'll soon have multiple targets when the next advance in GCC expose other bugs caused by other undefined behaviors.

> To put it in perspective: I do not want the software I rely on to have ONE randomly inserted bug activated for a 200% improvement of its overall performance.

Under which perspective bugs are not "randomly" inserted? (given a non malicious intent in the first place). Would you be OK with "ONE randomly inserted bug activated" because of a change for support of a new hardware, or a new feature. Do you realize that even a bug fix can activate other bugs? Do you realize that you can easily avoid all that kind of trouble by NOT upgrading your system library ever, if you really want to? Do you understand that optimizations made at system level follow a different economic than optimizations made at application level? Do you understand that compiler/library evolution have participated in the moore law, and that your computer would be maybe 4x slower or produce 4x more heat if we still were in the naive compiler era and if low level layers had not been updated to be efficient with modern processor architecture?

> The proposed benefit is 20% of 2%, or 0.4%. The possible cost is my bank account.

Given the nature of the memory corruption, very unlikely to have that kind of security impact (but not 100 % impossible).

What is really funny is that even without the incriminated patch, the memcpy was NOT a memmove (fortunately). This particular Flash call resulted in corrupted data when copying memory in reverse order because the pointers was in a specific order, and the area overlapping. Calling memcpy with the previous GLibc implementation, and probably 99% of implementations existing on earth, will still result in data corruption if memory area are overlapping in the other order.

So I suggest you run your whole system LD_PRELOAD'ing all processes with a library that calls memmove instead of memcpy, if you are worried too much about that.
I also suggest that you immediately start looking for other bugs more susceptible to have big security impacts than this class, and that you also workaround them in weird way instead of fixing them correctly in the first place.
Maybe it would indeed be easier that you take a really old distribution, with a compiler that does very few optimization, and a very simple libc, and stick to it forever. (Well, you'll still have to do the memcpy/memmove replacement trick, but you'll have very few optimizations, so I guess that will make you happy.)

And oh, I forget to tell you: randomly defining "undefined behaviors" without auditing every components involved in both the system and its construction can sometime expose bugs with an high security impact. See the NULL page mapping debacle.

> I hope that the packagers of the distributions do the sensible thing.
> That is: pull that change out and shoot it.

Yeah, they all are reading LWN comments, waiting for your enlightenments.

This isn't either/or. Phase in such changes instead!

Posted Nov 12, 2010 1:10 UTC (Fri) by dafid_b (guest, #67424) [Link] (3 responses)

Firstly, I am sorry, I wrote as if I was posting a new post at the top level but posted to it as a reply to your post.. and then corrected by reposting clarified thoughts at the top level - not the right way to do things.

However, I think that the conversation we are having in this thread is a bit disjointed because when I say 'user' I mean a 3rd party to this conversation - not the Flash developer, not the Glibc developer, nor the crushfs developer.. but a simple user:). Whereas I think you read my user as 'developer'.

In the later post I liken (the knowing continued) delivery of this change to Glibc to mugging the person (user) who is near (uses the software written by) a jay-walker(developer who used undefined behaviour that used to work in the past).

That does not seem very fair to the user. It is sure to convince most users to stop being Linux users if the change does cause a security issue to happen - and they find out that it was a deliberate choice.

I think a better policy would be to mug the developer (send the crash reports, mocking messages in the trade press, or whatever).

This could be done by putting an intercept layer between Glibc in system tests that any user could load - at a known performance cost - that logs such violations of API requirements.

I would be happy, ecstatic even, to take part in such a mugging, when I am not doing my banking on the system.

Thanks for pointing out my other mistake - I should have said 'randomly activated bugs' rather than 'randomly inserted bugs' - as from both the end-user and developer perspective that is what is happening.

On your other points, I agree. However I think that the problem the points address is developer behaviour, and the person you mug is the user.

end user should not be punished, depending of distro target

Posted Nov 12, 2010 2:28 UTC (Fri) by xilun (guest, #50638) [Link] (2 responses)

Even according to my position that the glibc has 0 responsability in the way Flash misbehaves, I agree with you that the end user should, when possible, not be punished. I think this is clearly the job of distributions, that can identify, given their target and if they have the occasion and resources, which interaction exposing a bug in some piece of software is acceptable, and which is not. Maybe some distributions will indeed make the choice of temporarily reverting this optimisation, but I hope Adobe will simply have enough time to fix their stuff before lot of distro are released, and if the timing is OK maybe the distributions could pressure Adobe by sending them a clear message that they will ship with the best technology anyway, so that it's up to Adobe to cleanup their mess if they want that flash continue working correctly under distro X. If the timing is bad, I would clearly prefer an automatised LD_PRELOAD style work around (if done transparently for the end user) to a whole system disabling of an optimisation.

But I would not even be angry against a distribution that makes the choice to not care at all about Flash. I perfectly understand that some can absolutely not care about Flash, in which case an angry user should just do a workaround himself or switch to an other distro, if he indeed is not in the target of the one he used.

end user should not be punished, depending of distro target

Posted Nov 12, 2010 5:03 UTC (Fri) by dafid_b (guest, #67424) [Link]

we are in agreement mostly.

end user should not be punished, depending of distro target

Posted Nov 12, 2010 19:28 UTC (Fri) by charlieb (guest, #23340) [Link]

> Maybe some distributions will indeed make the choice of temporarily
> reverting this optimisation

And wouldn't it be nice if Fedora 14 were to do this :-)

This isn't either/or. Phase in such changes instead!

Posted Nov 11, 2010 23:04 UTC (Thu) by dgm (subscriber, #49227) [Link] (1 responses)

How much should the changes be delayed? A year? two?
By that time maybe the targeted machines will no longer exists. Remember those processors are not exactly new (Core2 and Atom).

This isn't either/or. Phase in such changes instead!

Posted Nov 12, 2010 18:26 UTC (Fri) by chad.netzer (subscriber, #4257) [Link]

And they are certainly not old, either. This optimization will be relevant for 3-5 years, at least. And the answer to your question is 6 months. ;)

This isn't either/or. Phase in such changes instead!

Posted Nov 12, 2010 0:10 UTC (Fri) by bojan (subscriber, #14302) [Link]

> Implement the new semantics in a "testing" library so that people can test it out before it goes "live", but don't ram it down production systems at first.

We are talking about brand new Fedora release, with yet unreleased version of glibc 2.12.90. At some point software has to get shipped in order to get tested by real users.

This isn't either/or. Phase in such changes instead!

Posted Nov 12, 2010 11:06 UTC (Fri) by marcH (subscriber, #57642) [Link]

> I think the solution for stuff like this is to phase in major changes, in a slower way.

This would be a way too much reasonable and professional approach. It would have a very high risk of less flamewars.

Glibc change exposing bugs

Posted Nov 11, 2010 23:03 UTC (Thu) by dafid_b (guest, #67424) [Link] (7 responses)

I made an earlier comment at the wrong level - both technically - to the wrong part of the discussion, and also from the purely selfish perspective that the change in Glibc behaviour could break existing systems - potentially costing me my cash.

An earlier suggestion was to replace the change with a API violation detector that causes programs to crash rather than corrupt their state.

This is better than a silent corruption - but still antisocial. A program that used to work now fails. Not everyone can fix the cause of the crash. Not all software is unimportant to the user. It is a bit like suggesting that when you see a j-walker you should just mug the person next to them, as a deterrent for future jay-walkers.

I think it is fine that testing releases use the crash-bad-behaving applications change.

But the system released to users should provide the previously working Glibc.

And the Glibc developers should listen to Linus.

Cause, I really would prefer my software to work.

Selfish? Yes.

Glibc change exposing bugs

Posted Nov 12, 2010 0:15 UTC (Fri) by xilun (guest, #50638) [Link] (6 responses)

> Cause, I really would prefer my software to work.

Then do some system level tests.

Glibc maintainers are not responsible for your system integration and QA.

> Selfish? Yes.

Indeed.
But considering Glibc maintainers are not generous to the point the will do your system integration and QA, we have an impedance mismatch here, and they will just do as they want, which in the end seems quite logical.

Glibc change exposing bugs

Posted Nov 12, 2010 1:51 UTC (Fri) by dafid_b (guest, #67424) [Link] (5 responses)

Yes - More on that system level testing..

I would like to that in my spare time - not many hours - and leverage my subsequent relaxation browsing time...

What I am thinking is based on rough understanding, so please pass along any hints.

My idea is to provide a memcpy() that can safely be used in any application with minimal changes to the software behaviour, and yet provide logging of bad usage for proactive corrections.
It is ok if the system is slower.. but it should still work.

I don't really know how to do the logging as it should:
* not interfere with threads, signals etc
* be always available

The memcpy() is pretty simple..
A replacement memcpy() based on combination of memove() to test the parameters and the old memcpy() to provide the implementation for stability of my software.
When a memcpy() is made with bad args, that would normally invoke the special logic an error is logged by PID? and the old memcpy() still invoked to deliver the vanilla experience.

The replacement sounds like building a patched Glibc.

Any suggestions or hints for how to do the logging would be appreciated.

Glibc change exposing bugs

Posted Nov 12, 2010 5:00 UTC (Fri) by dafid_b (guest, #67424) [Link] (3 responses)

i hacked up something, around shared memory, which seems most likely to me (based on little experience) to be safe..

If this approach is reasonable.. then just need to link it into memcpy as outlined above to have a trace of last few hundred errors on the system in shared memory waiting to dumped...

Thoughts?

$ dd of=/tmp/data if=/dev/zero count=16

$ g++ test.cpp

$ ./a.out /tmp/data
ret=0x80489de, dest=0x976d008, src=0x8048b4d, len=4, pid=205b

$ cat test.cpp
#include <sys/types.h>
#include <unistd.h>
#include <sys/mman.h>
#include <syscall.h>
#include <stdio.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/mman.h>
#include <syscall.h>
#include <errno.h>
#include <string.h>

#define SHARED_MEM_SIZE 8192

struct data // trace data for a memcpy() error
{
pid_t pid ;
const void *ret ;
const void *dest ;
const void *src ;
size_t len ;
} ;

struct log
{
long int index ; // sequence or index of last entry used
struct data entry[ SHARED_MEM_SIZE/sizeof(struct data) - 1 ] ; // vector of logging instances
} ;
struct log *pLog ;
#define N_ENTRIES (sizeof(pLog->entry)/sizeof(pLog->entry[0]))

void
displayLog(int i)
{
int j = i % N_ENTRIES ; // restrict index.

fprintf(stderr, "ret=%p, dest=%p, src=%p, len=%ld, pid=%lx\n",
pLog->entry[j].ret,
pLog->entry[j].dest,
pLog->entry[j].src,
(long)pLog->entry[j].len,
(unsigned long)pLog->entry[j].pid
) ;

}

void
capture(const void *dest, const void *src, size_t n)
{
int newLoc, oldLoc ;
do {
oldLoc = pLog->index ;
newLoc = pLog->index + 1 ;
} while ( __sync_bool_compare_and_swap( pLog->index, newLoc, oldLoc ) ) ;

int j = newLoc % N_ENTRIES ; // restrict index.
pLog->entry[j].ret = __builtin_return_address(0) ; //__builtin_extract_return_address(ra) ;
pLog->entry[j].dest = dest ;
pLog->entry[j].src = src ;
pLog->entry[j].len = n ;
pLog->entry[j].pid = getpid() ;
}

void *
setup(char *av)
{
void * p = NULL ;
int fd ;

fd = open(av, O_RDWR, 0x777) ; // open the file.
if (fd < 0)
{
fprintf(stderr, "Failed to open file /%s/ errno=%d\n", av, errno) ;
return 0 ;
}
p = mmap(0, SHARED_MEM_SIZE, PROT_WRITE|PROT_READ, MAP_SHARED, fd, 0) ;
if (p == 0)
{
fprintf(stderr, "Failed to mmap file /%s/ errno=%d\n", av, errno) ;
close(fd) ;
return 0 ;
}
// have mapping in p of SHARED_MEM_SIZE bytes
fprintf(stderr, "mapped %s to %p on fd %d\n", av, p, fd) ;

return p ;
}

int main(int ac, char **av)
{
int fd ;

if (ac >1)
{
pLog = (struct log*)setup(av[1]) ;
}
else
fprintf(stderr, "map <file>\n") ;

if (pLog)
{
capture(strdup("pete"), "joe", 4 ) ;
displayLog(pLog->index) ;
}

return 0 ;
}

Glibc change exposing bugs

Posted Nov 12, 2010 6:44 UTC (Fri) by cmccabe (guest, #60281) [Link] (2 responses)

> fd = open(av, O_RDWR, 0x777)

This is not correct. You want

> fd = open(av, O_RDWR, 0777)

Yes, it's an octal constant. Or use the symbolic constants.

Also, this is more of a personal preference thing, but bumpyCaps and hungarian notation are frowned on by most.

In a larger sense, I think you don't want to rebuild glibc. You probably just want to use "the LD_PRELOAD trick"

If I were you, I would print my nastygrams to syslog, using the syslog(3) function. Most sysadmins don't check random areas of shared memory that often. If you do choose to use shm, try shm_open.

cheers,
C.

Glibc change exposing bugs

Posted Nov 12, 2010 9:09 UTC (Fri) by dafid_b (guest, #67424) [Link] (1 responses)

Thanks for the code review, much appreciated.

Is syslog() safe to call at this point?
It generates formatted output, which seems like it could itself call memcpy() or do other stuff in te library that the app did not allow for in its plan when it called memcpy.
Also is the system call that sends the message to the log safe, or can it have side effect such as signals and new error codes in errno?

I would be very happy if the answer to the above is: syslog() safe to call like this with no side-effects.

Glibc change exposing bugs

Posted Nov 13, 2010 3:02 UTC (Sat) by cmccabe (guest, #60281) [Link]

> Is syslog() safe to call at this point?
> It generates formatted output, which seems like it could itself call
> memcpy() or do other stuff in te library that the app did not allow for in
> its plan when it called memcpy

You raise a good issue. glibc's version of syslog is known to call malloc sometimes, which means that you shouldn't use it from within a signal handler. Surprisingly, memcpy isn't on the official list of "async-signal safe" functions, so you could argue that such an implementation would be POSIX conforming :)

But seriously. I think the best thing to do is probably implement your own version of syslog with no memory allocations or calls to memcpy. It's pretty easy to do in a few hundred lines. I had to do it before when writing a good signal handler.

C.

Glibc change exposing bugs

Posted Nov 12, 2010 18:21 UTC (Fri) by chad.netzer (subscriber, #4257) [Link]

Valgrind already does this, and MUCH more:

http://valgrind.org/docs/manual/mc-manual.html#mc-manual....

Not to deprive you of the experience of doing it yourself, which can be instructive. However, you should need to reinvent the wheel if all you want is the use of the tool. At the very least, you can see how valgrind does it. As for automatically invoking it, well that's an exercise for the reader. :)

C99 restrict

Posted Nov 12, 2010 11:12 UTC (Fri) by marcH (subscriber, #57642) [Link]

Note that the C99 'restrict' keyword will make the documentation of memcpy() more explicit (I mean for developers who try to actually understand documentation).

GCC built-in memcpy

Posted Nov 13, 2010 1:11 UTC (Sat) by rriggs (guest, #11598) [Link]

I am pretty sure that GCC, on at least one architecture I use (SPARC, I think), has a built-in memcpy which is implemented such that overlapping copies fail. I benefit by having this defect made visible on more common processor architecture.

Kernel 2.6.36 broke my CentOS-5 Gnome 2.16 battery info

Posted Nov 25, 2010 17:32 UTC (Thu) by dag- (guest, #30207) [Link] (1 responses)

> The rule is simply "we don't break user-space".

Well, I don't know how general that rule is, because kernel 2.6.36 ripped out an important set of /proc/acpi entries that are still used on older Gnome releases (eg. CentOS-5).

A separate project, named ELRepo, provides backported kernel modules, but also the current mainline kernel built specifically for CentOS-5. Which is great for testing/running the latest kernel with a stable and trusted distribution. Since 2.6.36, not anymore, as my laptop couldn't provide proper ACPI information, and as such couldn't suspend/hibernate before running out of power :-(

More information about this, and other breakage is available from:

http://elrepo.org/tiki/kernel-ml

Kernel 2.6.36 broke my CentOS-5 Gnome 2.16 battery info

Posted Nov 25, 2010 18:10 UTC (Thu) by dag- (guest, #30207) [Link]

I intended to say that ELRepo packages were built specifically on RHEL 5, (but are intended the various RHEL rebuilds as well). In the case of my laptop, it was using CentOS-5 but in the meantime moved to RHEL 6 instead...


Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds