MAP_FIXED_SAFE
Any mmap() call allows the calling process to specify an address for the mapping. In normal operation, though, this address is simply a hint that the kernel is free to ignore. MAP_FIXED exists for cases where the mapping really has to be placed at the requested address or the application will fail to work. The kernel takes this flag seriously, to the point that, if there is already another mapping in the given address range, the existing mapping will be destroyed to make room for the new one. This seems like a strange semantic; if an application wants a mapping at a given area, it should probably be able to take responsibility for making room for that mapping. But mmap() is specified to work that way, so that is what happens.
Needless to say, that can be problematic if the application wasn't aware of the conflicting mapping — something that could occur as the result of a bug, address-space layout randomization, disagreements between libraries, or deliberate manipulation by an attacker. The data contained within that mapping (or the overlapping part of it, at least) will be silently dropped on the floor and the new mapping will show up in its place. The chances of things working correctly after that are likely to be fairly small. In some cases, security vulnerabilities can result; see, for example, CVE-2017-1000253. In that case, the kernel's internal use of MAP_FIXED to load programs into memory was exploited to corrupt the stack.
A solution can be found in Michal Hocko's MAP_FIXED_SAFE patch set. It adds a new mmap() flag called, surprisingly, MAP_FIXED_SAFE with semantics similar to MAP_FIXED with one exception: the operation will fail if the targeted address range is not free. The kernel's ELF loader is modified to use this new flag when mapping programs into memory; that will cause program loading to fail if two mappings collide, but that is better than the alternative. It is expected that new code would use this new flag in almost all cases, and that older programs would eventually be switched as well.
Some had suggested adding a separate flag to modify the behavior of MAP_FIXED, so that applications would pass something like MAP_FIXED|MAP_SAFE to mmap(). The problem with that approach is that mmap() is one of those system calls that never checked for unknown flags. A program using that construction would, as a result, silently fall back to MAP_FIXED on older kernels that lacked support for the new MAP_SAFE flag. Using a new flag means that, while the application will not get the desired failure status on an older kernel if the address range is not available, it also will not clobber any existing mappings (because the specified address will be treated as a hint by the kernel).
This change is pretty much ready to go, and Hocko has requested that it be merged. There is, however, the vital issue which has caused the most discussion about this patch series: the naming of MAP_FIXED_SAFE. For various reasons, various developers wanted a different name. Suggestions included MAP_FIXED_UNIQUE, MAP_FIXED_NOREPLACE, MAP_FIXED_NO_CLOBBER, MAP_TANTRUM, MAP_EXACT, MAP_NOFORCE, and quite a few others. It was just the sort of discussion that results when the technical issues are resolved, but everybody wants to put their stamp on the final result.
After enduring a fair amount of that discussion, Hocko made his own decision on the naming:
He also stated that anybody who was truly unhappy with the name was welcome to block the patch and somehow build a consensus around a better one, but that he was done with it. So, naturally, somebody objected, and Hocko wished him luck carrying the patch set forward.
Given the personalities involved, one might think that a useful patch will
end up simply blocked at this point. Your editor would wager, though, that
the MAP_FIXED_SAFE patches will be merged in something close to
their current form. They address a real problem; holding them up while
waiting for the perfect name does not seem like an approach that will do
anybody any good.
Index entries for this article | |
---|---|
Kernel | Memory management |
Security | Linux kernel |
Posted Dec 14, 2017 0:39 UTC (Thu)
by pr1268 (subscriber, #24648)
[Link] (16 responses)
Why not do the converse of that, i.e. make MAP_FIXED default to the MAP_SAFE behavior as defined and implemented by Mr. Hocko's patch, and add the separate flag, e.g. MAP_RIGHT_HERE for the legacy behavior? As an aside, I've used mmap(2) for years, and not once have I ever needed to use MAP_FIXED. (But I'm sure there's a good use for it somewhere...)
Posted Dec 14, 2017 1:33 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (9 responses)
I once wrote a Bayesian SPAM token database that self-repaired file corruption. (The devices it ran on were seeing substantial hardware failures, which years later was tracked down to XFS bugs.) It used a red-black tree and thus internal pointers, and I very briefly considered MAP_FIXED; but even circa 2003 it just wasn't worth baking in such an anachronism, and even when disregarding the headaches it would impose to utilize multiple databases concurrently.
But the reason the default semantics can't be changed is because of the Linux backwards compatibility guarantee. Also, while I've never used MAP_FIXED nor see myself ever using it, I don't find the semantics of silent replacement odd. dup2 has the same semantics, and for the same reasons the semantics seem quite natural and intuitive to me. It's just that back when people regularly mmap'd stuff to fixed addresses sbrk() was the primary (if not only) means of growing the heap, and static linking was the normal (if not only) method of linking.
Posted Dec 14, 2017 1:35 UTC (Thu)
by wahern (subscriber, #37304)
[Link]
Posted Dec 14, 2017 1:49 UTC (Thu)
by TheJH (subscriber, #101155)
[Link] (2 responses)
MAP_FIXED is how you properly, safely allocate virtually contiguous VMAs. This includes:
Try running strace on any binary that is dynamically linked and/or uses threads, and you should see MAP_FIXED.
Posted Dec 14, 2017 17:16 UTC (Thu)
by zlynx (guest, #2285)
[Link] (1 responses)
Posted Dec 14, 2017 19:30 UTC (Thu)
by ballombe (subscriber, #9523)
[Link]
Posted Dec 14, 2017 5:58 UTC (Thu)
by jreiser (subscriber, #11027)
[Link] (1 responses)
MAP_FIXED is necessary for the proper mapping by execve() of any native ET_EXEC file. The addresses are fixed, after all. There are ET_EXEC files which are several percent smaller and run several percent faster than the corresponding ET_DYN. Some environments have done a good risk-vs-benefit analysis, and for them ET_EXEC is worth it.
MAP_FIXED is necessary for the proper mapping by execve() and dlopen() of any native ET_DYN file which has more than one PT_LOAD, which is nearly all of them. The second and subsequent PT_LOADs must have fixed offsets from the first PT_LOAD. In most cases the best strategy is to use mmap(0, size, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, FD_ANON, 0) to obtain enough contiguous space to hold the convex hull of all PT_LOAD, then mmap() each PT_LOAD into that space using MAP_FIXED.
MAP_FIXED is essential for program-manipulating programs such as valgrind and upx.
Posted Dec 14, 2017 19:14 UTC (Thu)
by roc (subscriber, #30627)
[Link]
Posted Dec 16, 2017 14:47 UTC (Sat)
by dtalen (guest, #86448)
[Link]
Posted Dec 20, 2017 22:24 UTC (Wed)
by mdenton (guest, #118411)
[Link]
Posted Dec 21, 2017 5:42 UTC (Thu)
by HelloWorld (guest, #56129)
[Link]
Posted Dec 14, 2017 1:44 UTC (Thu)
by TheJH (subscriber, #101155)
[Link]
Look at, for example, how glibc allocates memory for threads:
$ cat thread.c
MAP_FIXED is perfectly safe as long as you only pass in address ranges that you have allocated yourself - just like other APIs, like munmap() or mprotect(), which you also shouldn't call on address ranges that haven't been allocated to you. Unlike MAP_FIXED, the suggested MAP_FIXED_SAFE is for a very uncommon usecase and actually requires more care to get right. (Which is also why MAP_FIXED_SAFE isn't exactly a good name - it suggests that it is a "safer" alternative to MAP_FIXED, which it simply isn't in what I think is probably the majority of usecases.)
Posted Dec 14, 2017 8:53 UTC (Thu)
by epa (subscriber, #39769)
[Link] (3 responses)
Posted Dec 14, 2017 15:56 UTC (Thu)
by Sesse (subscriber, #53779)
[Link] (2 responses)
Posted Dec 16, 2017 10:57 UTC (Sat)
by cpitrat (subscriber, #116459)
[Link] (1 responses)
Posted Dec 16, 2017 13:01 UTC (Sat)
by epa (subscriber, #39769)
[Link]
Posted Dec 14, 2017 15:58 UTC (Thu)
by magfr (subscriber, #16052)
[Link]
Posted Dec 15, 2017 10:48 UTC (Fri)
by ortalo (guest, #4654)
[Link] (6 responses)
Posted Dec 16, 2017 0:56 UTC (Sat)
by mogendavido (guest, #99770)
[Link] (3 responses)
Posted Dec 16, 2017 10:58 UTC (Sat)
by cpitrat (subscriber, #116459)
[Link] (1 responses)
Posted Dec 19, 2017 9:53 UTC (Tue)
by ortalo (guest, #4654)
[Link]
Posted Dec 19, 2017 9:50 UTC (Tue)
by ortalo (guest, #4654)
[Link]
Posted Dec 18, 2017 13:46 UTC (Mon)
by shane (subscriber, #3335)
[Link] (1 responses)
Posted Dec 19, 2017 9:56 UTC (Tue)
by ortalo (guest, #4654)
[Link]
How about this?
Some had suggested adding a separate flag to modify the behavior of MAP_FIXED, so that applications would pass something like MAP_FIXED|MAP_SAFE to mmap(). The problem with that approach is that mmap() is one of those system calls that never checked for unknown flags.
How about this?
How about this?
How about this?
- guard pages for thread stacks
- the different sections for library mappings (you need separate VMAs for code, readonly data, copy-on-write data and zero-initialized data)
How about this?
How about this?
Using MAP_FIXED allows the memory not to be counted as committed.
Sometime this is useful.
MAP_FIXED is necessary for almost all execve() and dlopen()
MAP_FIXED is necessary for almost all execve() and dlopen()
How about this?
How about this?
the problem isn't MAP_FIXED...
How about this?
#include <pthread.h>
void *testfn(void *arg) { return (void*)0; }
int main(void) {
pthread_t thread;
pthread_create(&thread, NULL, testfn, (void*)0);
return 0;
}
$ gcc -o thread thread.c -Wall -pthread
$ strace ./thread
[...]
mmap(NULL, 3795360, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9b24276000
mprotect(0x7f9b2440b000, 2097152, PROT_NONE) = 0
mmap(0x7f9b2460b000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f9b2460b000
mmap(0x7f9b24611000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f9b24611000
[...]
How about this?
How about this?
How about this?
How about this?
How about this?
Terminology wisdom
by any other word would smell as sweet;"
Terminology wisdom
Terminology wisdom
Terminology wisdom
Terminology wisdom
That's the appropriate time of year for dreaming, no?
Terminology wisdom
Terminology wisdom