LWN: Comments on "KAISER: hiding the kernel from user space"
https://lwn.net/Articles/738975/
This is a special feed containing comments posted
to the individual LWN article titled "KAISER: hiding the kernel from user space".
en-usMon, 28 Oct 2024 09:05:23 +0000Mon, 28 Oct 2024 09:05:23 +0000https://www.rssboard.org/rss-specification[email protected]KAISER: hiding the kernel from user space
https://lwn.net/Articles/774398/
https://lwn.net/Articles/774398/dembego3<div class="FormattedComment">
Thank you for caring for my safety:-)<br>
</div>
Sun, 09 Dec 2018 15:58:07 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/743368/
https://lwn.net/Articles/743368/nix<div class="FormattedComment">
Of course, that tuning step already requires sysadmin intervention to turn off CPU frequency changes, ideally hyperthreading, etc: adding one more intervention isn't likely to be terribly difficult.<br>
</div>
Sat, 06 Jan 2018 16:02:06 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/743314/
https://lwn.net/Articles/743314/ridethewave<div class="FormattedComment">
<font class="QuotedText">>I guess the main problem with that idea is that page tables take 8 bytes of physical memory >per 4KB of virtual address space</font><br>
Couldn't you just map each virtual address to the same physical address then?<br>
</div>
Sat, 06 Jan 2018 00:26:03 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/743158/
https://lwn.net/Articles/743158/clbuttic<div class="FormattedComment">
ATLAS (Automatically Tuned Linear Algebra Software) does a lengthy tuning step that expects consistent CPU performance.<br>
</div>
Fri, 05 Jan 2018 08:26:24 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/742686/
https://lwn.net/Articles/742686/excors<div class="FormattedComment">
I guess the main problem with that idea is that page tables take 8 bytes of physical memory per 4KB of virtual address space. If you want to fill up the whole ~48-bit virtual address space with distinct PTEs, you'd need 512GB of page tables.<br>
<p>
You could try to reduce the size by e.g. using a single dummy PTE table that's shared by all the higher-level tables, instead of keeping them distinct. But an attacker can likely measure the timing difference between a page walk that fetches the PTE from cache, vs one that fetches it from RAM. If you access address A, then address A+4096, and the second one is fast (i.e. the PTE is already in the cache), you know that's using the dummy PTE, so it's still leaking information about where the kernel is.<br>
</div>
Wed, 03 Jan 2018 17:30:27 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/742677/
https://lwn.net/Articles/742677/EdRowland<div class="FormattedComment">
Couldn't you map a dummy page into the holes to prevent timing differences between populated memory that's unreadable at ring 3 and unpopulated memory that now references a dummy page?<br>
</div>
Wed, 03 Jan 2018 16:55:22 +0000wordage nuance
https://lwn.net/Articles/740281/
https://lwn.net/Articles/740281/Garak<blockquote>Concerning the article, hyperbole is the standard in security news, but "a hardened kernel is no longer optional" seems to be a little extreme even so. I very much hope that stuff like this will be optional. </blockquote>
My reaction for a couple seconds as well till I read the next sentence. I agree that sentence is not the best way to describe things. I think it's important to highlight that security-vs-performance tradeoffs is a vast spectrum of subtle choices that *depend on the situation/deployment*. There are many different situations. Quite often a performance hit from enabling SELinux or whatever new hardening-with-five-percent-hit tactic, is absolutely not worth it. Other times your computers are trying to secure millions of dollars of cryptocurrency/etc. Most users should be taught about such nuance versus the "more secure equals always better" narrative. If something is useful to lots of people, sure it should be available as an option. But leave it to the distributors and then the end users to figure out when and where various options should be tuned.Thu, 30 Nov 2017 01:32:50 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/740060/
https://lwn.net/Articles/740060/excors<div class="FormattedComment">
What stops an attacker from just running their test thousands or millions of times, to average out the randomisation that you've added?<br>
<p>
For example, the KAISER paper says the "double page fault attack" distinguishes page faults taking 12282 cycles for mapped pages and 12307 cycles for unmapped pages, i.e. a difference of 25 cycles. If I remember how maths works: You could add a random delay to the page fault handler (or randomly vary the CPU speed or whatever) so it has a mean and standard deviation of (M, S) for mapped pages and (M+25, S) for unmapped. If S > 25 (very roughly) then the attacker can measure a page fault but can't be sure whether it belongs to the first category or the second.<br>
<p>
But if they repeat it 10,000 times (which only takes a few msecs) and average their measurements, they'd expect to get a number in the distribution (M, S/100) for mapped pages or (M+25, S/100) for unmapped. You'd have to make S > 2500 to stop them being able to distinguish those cases easily. At that point it's much more expensive than the KAISER defence, and it would still be useless against an attacker who can repeat the measurement a million times. And that's for measuring a relatively tiny difference of 25 cycles in an operation that takes ~12K cycles; it's harder to protect the TSX or prefetch attacks where the operation only takes ~300 cycles.<br>
<p>
It seems much safer to ensure operations will always take a constant amount of time, rather than adding randomness and just hoping the statistics work in your favour.<br>
</div>
Mon, 27 Nov 2017 17:43:28 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/740057/
https://lwn.net/Articles/740057/NAR<div class="FormattedComment">
It might hinder optimization if running the same code under same circumstances results in different execution speed...<br>
</div>
Mon, 27 Nov 2017 16:55:32 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/740018/
https://lwn.net/Articles/740018/abufrejoval<div class="FormattedComment">
I'd like to see CPUs turn to randomized instruction timings to implement power control and throw a punch at timing based attacks this way.<br>
These days where CPUs constantly vary their speeds to either exploit every bit of thermal headroom they can find or re-adjust constantly to hit an energy optimum for a limited value workload, it seems almost stupid to try sticking to a constant speed.<br>
<p>
If instead you set a randomization bias you can run CPUs at say 5GHz logical clock and then add random delays to hit say 3, 2 or 1 GHz on average depending on the workload. Every iteration of an otherwise pretty identical loop would wind up a couple of clocks different, throwing off snoop code without much of an impact elsewhere. Of course it shouldn't be one central clock overall, but essentially any clock domain could use its own randomization source and bias. I guess CPUs have vast numbers of clock synchronization gates these days anyway, so very little additional hardware should be required.<br>
<p>
Stupid, genius or simply old news?<br>
</div>
Mon, 27 Nov 2017 15:46:19 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739815/
https://lwn.net/Articles/739815/mlankhorst<div class="FormattedComment">
Can't it be done for nearly free using SMAP + SMEP?<br>
<p>
Put the kernel mapping at ring3, and make the remainder of the upper 64-bits mapped RWX at ring 1 or 2.<br>
<p>
Try to exploit timing differences then from RING 0!<br>
<p>
Or am I thinking too simple?<br>
</div>
Wed, 22 Nov 2017 10:40:01 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739561/
https://lwn.net/Articles/739561/luto<div class="FormattedComment">
It could be TIF_KAISER, no?<br>
<p>
But this is definitely not a v1 feature.<br>
</div>
Sun, 19 Nov 2017 06:16:21 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739483/
https://lwn.net/Articles/739483/antonI read some performance caveats about vmaskmovps (AVX, not sure if there is an SSE equivalent) that make me think that this instruction can be used for such purposes, too.
<p>Concerning the article, hyperbole is the standard in security news, but "a hardened kernel is no longer optional" seems to be a little extreme even so. I very much hope that stuff like this will be optional.
<p>A possibly less costly way to mitigate attacks that try to defeat KASLR might be to map additional inaccessible address space that would respond to the attacks just like real kernel memory.Sat, 18 Nov 2017 00:09:37 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739440/
https://lwn.net/Articles/739440/valarauca<div class="FormattedComment">
Timing attacks are becoming the bane of x64.<br>
<p>
AVX512 adds explicit flags to suppress memory errors on scatter/gather load/store vectorized instructions which will just add another method to exploit this. The ways of _accessing_ memory you can't access on x64 just continue to grow. I really don't see how AMD64 can fix this without breaking either the page table or the debug timers.<br>
</div>
Fri, 17 Nov 2017 17:45:32 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739366/
https://lwn.net/Articles/739366/hansendc<div class="FormattedComment">
<font class="QuotedText">> if a syscall is about to become as expensive as IPC on L4... would the (theoretical) performance of the respective kernels be similar after KAISER?</font><br>
<p>
Most of the KAISER performance impact is purely from the cost of manipulating the hardware. L4 and other kernels would pay the same cost Linux would.<br>
<p>
It's not fair to compare a non-hardened kernel to a hardened one, though. It's apples-to-oranges.<br>
</div>
Thu, 16 Nov 2017 20:52:00 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739347/
https://lwn.net/Articles/739347/hansendc<div class="FormattedComment">
Yes, this could be done, at least theoretically. But, the contexts where we have to decide to "do KAISER" or not are very tricky. We don't have a stack and don't have registers to clobber, so it's tricky to pull off.<br>
<p>
You would essentially need to keep a bit of per-cpu data that was consulted very early in assembly at kernel entry. It would have to be updated at every context switch, probably from some flag in the task_struct. Again, doable, but far from trivial.<br>
</div>
Thu, 16 Nov 2017 20:40:57 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739355/
https://lwn.net/Articles/739355/ttelford<i> Just the new instructions (CR3 manipulation) add a few hundred cycles to a syscall or interrupt</i><br/><br/>
A few hundred cycles to a syscall or interrupt is vaguely similar to the <a href=http://l4hq.org/docs/performance.php>basic IPC cost</a> of the L4 microkernel. (200-300 cycles for amd64).<br/><br/>
Kernels are not my area of expertise, so I have to ask: if a syscall is about to become as expensive as IPC on L4... would the (theoretical) performance of the respective kernels be similar after KAISER?<br>Thu, 16 Nov 2017 18:34:09 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739231/
https://lwn.net/Articles/739231/ballombe<div class="FormattedComment">
So maybe pushing this will motivate architecture designers to provide this feature, like it was done for virtualization.<br>
<p>
</div>
Thu, 16 Nov 2017 12:52:09 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739235/
https://lwn.net/Articles/739235/Cyberax<div class="FormattedComment">
Sigh. The kernel root holes are being found every month or so. But in order to exploit them reliably you need to know the kernel memory layout. And Most obvious software leaks of this information are now closed.<br>
<p>
The problem is that hardware simply makes all software countermeasures irrelevant without something like KAISER.<br>
</div>
Thu, 16 Nov 2017 10:41:09 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739228/
https://lwn.net/Articles/739228/alkbyby<div class="FormattedComment">
"in case" doesn't seem enough justification to pay such a massive price.<br>
</div>
Thu, 16 Nov 2017 09:29:06 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739220/
https://lwn.net/Articles/739220/Cyberax<div class="FormattedComment">
Kernel address hiding is needed to protect the kernel in case there's a bug that allows code execution in the kernel mode.<br>
<p>
But it looks like software-based hiding is ineffective by itself with the current model.<br>
</div>
Thu, 16 Nov 2017 08:08:11 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739215/
https://lwn.net/Articles/739215/alkbyby<div class="FormattedComment">
Looks like something bad is coming. Such as mega-hole maybe in hardware that can be mitigated by hiding kernel addresses.<br>
<p>
Otherwise I cannot see why simply hiding kernel addresses better, suddenly becomes important enough to spend massive amount of cpu on it.<br>
</div>
Thu, 16 Nov 2017 07:21:29 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739210/
https://lwn.net/Articles/739210/Cyberax<div class="FormattedComment">
Can KAISER be made optional on process-level? Perhaps through a croup?<br>
<p>
I would definitely like to protect my browser and anything started by it, but I would like my gcc started from a terminal to run at full speed.<br>
</div>
Thu, 16 Nov 2017 04:57:55 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739200/
https://lwn.net/Articles/739200/jamesmorris<div class="FormattedComment">
Also, it's an architectural feature, so it's true on Linux.<br>
<p>
<p>
</div>
Thu, 16 Nov 2017 02:10:55 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739198/
https://lwn.net/Articles/739198/jamesmorris<div class="FormattedComment">
Correct for SPARC. The user and kernel address spaces are separate.<br>
<p>
</div>
Thu, 16 Nov 2017 02:06:30 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739122/
https://lwn.net/Articles/739122/matthias<div class="FormattedComment">
This will not work as todays CPUs provide the possibility to trigger a pagefault without involving the kernel (e.g. TSX instructions). These instructions simply fail if the memory is not mapped or not accessible unlike usual memory accesses that would involve the kernels pagefault handler.<br>
<p>
I did also not know this before, but several of these attacks are described in the linked paper.<br>
</div>
Wed, 15 Nov 2017 15:55:42 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739119/
https://lwn.net/Articles/739119/hansendc<div class="FormattedComment">
Interrupts are already a small number of thousands cycles, even for a quick one. The entry plus IRET costs alone probably eclipse the (new) CR3 manipulation cost. So, while this CR3 manipulation makes things worse, it does not fundamentally change the speed of an interrupt.<br>
</div>
Wed, 15 Nov 2017 15:39:54 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739114/
https://lwn.net/Articles/739114/cborni<div class="FormattedComment">
I think both SPARC (I am sure about Solaris, but not Linux) and s390 (here I am sure for Linux) have separate kernel and user address spaces.<br>
</div>
Wed, 15 Nov 2017 14:56:26 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739112/
https://lwn.net/Articles/739112/epa<div class="FormattedComment">
If these timing-based attacks involve accessing a page in the kernel address space and getting some kind of memory protection fault, can't the kernel add a small random delay each time such a fault is hit before control returns to user space? The delay could even increase with subsequent faults, imposing a ceiling on how many faults the process can generate. That is, provided there's a way to do this while not imposing that same delay on processes that are using these faults for other things, like memory-mapped files.<br>
</div>
Wed, 15 Nov 2017 14:19:16 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739111/
https://lwn.net/Articles/739111/arjan<div class="FormattedComment">
I think you might mean s390<br>
</div>
Wed, 15 Nov 2017 14:05:56 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739100/
https://lwn.net/Articles/739100/ballombe<div class="FormattedComment">
<font class="QuotedText">> Since the beginning, Linux has mapped the kernel's memory into the address space of every running process. </font><br>
<p>
IIRC this is not true on SPARC, is it ?<br>
</div>
Wed, 15 Nov 2017 11:25:18 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739091/
https://lwn.net/Articles/739091/sorokin<div class="FormattedComment">
<font class="QuotedText">> Most workloads that we have run show single-digit regressions. 5% is a good round number for what is typical.</font><br>
<p>
A single change may not affect performance significantly (although 5% slow down is too much for my taste). But multiple changes can stack up over time. In the past this has been seen both for performance improving and for performance regression changes. A performance regression example is how compilers' performance regressed overtime (although since GCC 6 the trend has reversed). A performance improving example is sqlite.<br>
<p>
</div>
Wed, 15 Nov 2017 09:27:57 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739087/
https://lwn.net/Articles/739087/marcH<div class="FormattedComment">
<font class="QuotedText">> a number of hardware-based attacks on KASLR. They use techniques like exploiting timing differences in fault handling, </font><br>
<p>
I find timing-based attacks fascinating.<br>
<p>
In order to grow, computer performance has become less and less deterministic. On one hand this makes real-time and predictions harder. On the other hand this leaks more and more information about the system.<br>
<p>
<p>
<p>
</div>
Wed, 15 Nov 2017 08:19:44 +0000KAISER: hiding the kernel from user space
https://lwn.net/Articles/739075/
https://lwn.net/Articles/739075/jreiser<i>(CR3 manipulation) add a few hundred cycles to a syscall or interrupt.</i> That's a couple L3 cache misses [CAS latency on SDRAM has been ~60ns for decades] which probably is tolerable on a syscall. But hundreds of cycles is <b>horrible</b> for an interrupt. [33MHz is a common bus clock, so just generating an interrupt already requires an average latency of ~15ns.] Some architectures have a special interrupt context (and/or separate small locked caches) exactly for this reason.Wed, 15 Nov 2017 05:54:20 +0000