LWN: Comments on "KAISER: hiding the kernel from user space"

KAISER: hiding the kernel from user space

dembego3 — Sun, 09 Dec 2018 15:58:07 +0000

Thank you for caring for my safety:-)

KAISER: hiding the kernel from user space

nix — Sat, 06 Jan 2018 16:02:06 +0000

Of course, that tuning step already requires sysadmin intervention to turn off CPU frequency changes, ideally hyperthreading, etc: adding one more intervention isn't likely to be terribly difficult.

KAISER: hiding the kernel from user space

ridethewave — Sat, 06 Jan 2018 00:26:03 +0000

>I guess the main problem with that idea is that page tables take 8 bytes of physical memory >per 4KB of virtual address space
Couldn't you just map each virtual address to the same physical address then?

KAISER: hiding the kernel from user space

clbuttic — Fri, 05 Jan 2018 08:26:24 +0000

ATLAS (Automatically Tuned Linear Algebra Software) does a lengthy tuning step that expects consistent CPU performance.

KAISER: hiding the kernel from user space

excors — Wed, 03 Jan 2018 17:30:27 +0000

I guess the main problem with that idea is that page tables take 8 bytes of physical memory per 4KB of virtual address space. If you want to fill up the whole ~48-bit virtual address space with distinct PTEs, you'd need 512GB of page tables.

You could try to reduce the size by e.g. using a single dummy PTE table that's shared by all the higher-level tables, instead of keeping them distinct. But an attacker can likely measure the timing difference between a page walk that fetches the PTE from cache, vs one that fetches it from RAM. If you access address A, then address A+4096, and the second one is fast (i.e. the PTE is already in the cache), you know that's using the dummy PTE, so it's still leaking information about where the kernel is.

KAISER: hiding the kernel from user space

EdRowland — Wed, 03 Jan 2018 16:55:22 +0000

Couldn't you map a dummy page into the holes to prevent timing differences between populated memory that's unreadable at ring 3 and unpopulated memory that now references a dummy page?

wordage nuance

Garak — Thu, 30 Nov 2017 01:32:50 +0000

Concerning the article, hyperbole is the standard in security news, but "a hardened kernel is no longer optional" seems to be a little extreme even so. I very much hope that stuff like this will be optional.

My reaction for a couple seconds as well till I read the next sentence. I agree that sentence is not the best way to describe things. I think it's important to highlight that security-vs-performance tradeoffs is a vast spectrum of subtle choices that *depend on the situation/deployment*. There are many different situations. Quite often a performance hit from enabling SELinux or whatever new hardening-with-five-percent-hit tactic, is absolutely not worth it. Other times your computers are trying to secure millions of dollars of cryptocurrency/etc. Most users should be taught about such nuance versus the "more secure equals always better" narrative. If something is useful to lots of people, sure it should be available as an option. But leave it to the distributors and then the end users to figure out when and where various options should be tuned.

KAISER: hiding the kernel from user space

excors — Mon, 27 Nov 2017 17:43:28 +0000

What stops an attacker from just running their test thousands or millions of times, to average out the randomisation that you've added?

For example, the KAISER paper says the "double page fault attack" distinguishes page faults taking 12282 cycles for mapped pages and 12307 cycles for unmapped pages, i.e. a difference of 25 cycles. If I remember how maths works: You could add a random delay to the page fault handler (or randomly vary the CPU speed or whatever) so it has a mean and standard deviation of (M, S) for mapped pages and (M+25, S) for unmapped. If S > 25 (very roughly) then the attacker can measure a page fault but can't be sure whether it belongs to the first category or the second.

But if they repeat it 10,000 times (which only takes a few msecs) and average their measurements, they'd expect to get a number in the distribution (M, S/100) for mapped pages or (M+25, S/100) for unmapped. You'd have to make S > 2500 to stop them being able to distinguish those cases easily. At that point it's much more expensive than the KAISER defence, and it would still be useless against an attacker who can repeat the measurement a million times. And that's for measuring a relatively tiny difference of 25 cycles in an operation that takes ~12K cycles; it's harder to protect the TSX or prefetch attacks where the operation only takes ~300 cycles.

It seems much safer to ensure operations will always take a constant amount of time, rather than adding randomness and just hoping the statistics work in your favour.

KAISER: hiding the kernel from user space

NAR — Mon, 27 Nov 2017 16:55:32 +0000

It might hinder optimization if running the same code under same circumstances results in different execution speed...

KAISER: hiding the kernel from user space

abufrejoval — Mon, 27 Nov 2017 15:46:19 +0000

I'd like to see CPUs turn to randomized instruction timings to implement power control and throw a punch at timing based attacks this way.
These days where CPUs constantly vary their speeds to either exploit every bit of thermal headroom they can find or re-adjust constantly to hit an energy optimum for a limited value workload, it seems almost stupid to try sticking to a constant speed.

If instead you set a randomization bias you can run CPUs at say 5GHz logical clock and then add random delays to hit say 3, 2 or 1 GHz on average depending on the workload. Every iteration of an otherwise pretty identical loop would wind up a couple of clocks different, throwing off snoop code without much of an impact elsewhere. Of course it shouldn't be one central clock overall, but essentially any clock domain could use its own randomization source and bias. I guess CPUs have vast numbers of clock synchronization gates these days anyway, so very little additional hardware should be required.

Stupid, genius or simply old news?

KAISER: hiding the kernel from user space

mlankhorst — Wed, 22 Nov 2017 10:40:01 +0000

Can't it be done for nearly free using SMAP + SMEP?

Put the kernel mapping at ring3, and make the remainder of the upper 64-bits mapped RWX at ring 1 or 2.

Try to exploit timing differences then from RING 0!

Or am I thinking too simple?

KAISER: hiding the kernel from user space

luto — Sun, 19 Nov 2017 06:16:21 +0000

It could be TIF_KAISER, no?

But this is definitely not a v1 feature.

KAISER: hiding the kernel from user space

anton — Sat, 18 Nov 2017 00:09:37 +0000

I read some performance caveats about vmaskmovps (AVX, not sure if there is an SSE equivalent) that make me think that this instruction can be used for such purposes, too.

Concerning the article, hyperbole is the standard in security news, but "a hardened kernel is no longer optional" seems to be a little extreme even so. I very much hope that stuff like this will be optional.

A possibly less costly way to mitigate attacks that try to defeat KASLR might be to map additional inaccessible address space that would respond to the attacks just like real kernel memory.

KAISER: hiding the kernel from user space

valarauca — Fri, 17 Nov 2017 17:45:32 +0000

Timing attacks are becoming the bane of x64.

AVX512 adds explicit flags to suppress memory errors on scatter/gather load/store vectorized instructions which will just add another method to exploit this. The ways of _accessing_ memory you can't access on x64 just continue to grow. I really don't see how AMD64 can fix this without breaking either the page table or the debug timers.

KAISER: hiding the kernel from user space

hansendc — Thu, 16 Nov 2017 20:52:00 +0000

> if a syscall is about to become as expensive as IPC on L4... would the (theoretical) performance of the respective kernels be similar after KAISER?

Most of the KAISER performance impact is purely from the cost of manipulating the hardware. L4 and other kernels would pay the same cost Linux would.

It's not fair to compare a non-hardened kernel to a hardened one, though. It's apples-to-oranges.

KAISER: hiding the kernel from user space

hansendc — Thu, 16 Nov 2017 20:40:57 +0000

Yes, this could be done, at least theoretically. But, the contexts where we have to decide to "do KAISER" or not are very tricky. We don't have a stack and don't have registers to clobber, so it's tricky to pull off.

You would essentially need to keep a bit of per-cpu data that was consulted very early in assembly at kernel entry. It would have to be updated at every context switch, probably from some flag in the task_struct. Again, doable, but far from trivial.

KAISER: hiding the kernel from user space

ttelford — Thu, 16 Nov 2017 18:34:09 +0000

Just the new instructions (CR3 manipulation) add a few hundred cycles to a syscall or interrupt

A few hundred cycles to a syscall or interrupt is vaguely similar to the basic IPC cost of the L4 microkernel. (200-300 cycles for amd64).

Kernels are not my area of expertise, so I have to ask: if a syscall is about to become as expensive as IPC on L4... would the (theoretical) performance of the respective kernels be similar after KAISER?

KAISER: hiding the kernel from user space

ballombe — Thu, 16 Nov 2017 12:52:09 +0000

So maybe pushing this will motivate architecture designers to provide this feature, like it was done for virtualization.

KAISER: hiding the kernel from user space

Cyberax — Thu, 16 Nov 2017 10:41:09 +0000

Sigh. The kernel root holes are being found every month or so. But in order to exploit them reliably you need to know the kernel memory layout. And Most obvious software leaks of this information are now closed.

The problem is that hardware simply makes all software countermeasures irrelevant without something like KAISER.

KAISER: hiding the kernel from user space

alkbyby — Thu, 16 Nov 2017 09:29:06 +0000

"in case" doesn't seem enough justification to pay such a massive price.

KAISER: hiding the kernel from user space

Cyberax — Thu, 16 Nov 2017 08:08:11 +0000

Kernel address hiding is needed to protect the kernel in case there's a bug that allows code execution in the kernel mode.

But it looks like software-based hiding is ineffective by itself with the current model.

KAISER: hiding the kernel from user space

alkbyby — Thu, 16 Nov 2017 07:21:29 +0000

Looks like something bad is coming. Such as mega-hole maybe in hardware that can be mitigated by hiding kernel addresses.

Otherwise I cannot see why simply hiding kernel addresses better, suddenly becomes important enough to spend massive amount of cpu on it.

KAISER: hiding the kernel from user space

Cyberax — Thu, 16 Nov 2017 04:57:55 +0000

Can KAISER be made optional on process-level? Perhaps through a croup?

I would definitely like to protect my browser and anything started by it, but I would like my gcc started from a terminal to run at full speed.

KAISER: hiding the kernel from user space

jamesmorris — Thu, 16 Nov 2017 02:10:55 +0000

Also, it's an architectural feature, so it's true on Linux.

KAISER: hiding the kernel from user space

jamesmorris — Thu, 16 Nov 2017 02:06:30 +0000

Correct for SPARC. The user and kernel address spaces are separate.

KAISER: hiding the kernel from user space

matthias — Wed, 15 Nov 2017 15:55:42 +0000

This will not work as todays CPUs provide the possibility to trigger a pagefault without involving the kernel (e.g. TSX instructions). These instructions simply fail if the memory is not mapped or not accessible unlike usual memory accesses that would involve the kernels pagefault handler.

I did also not know this before, but several of these attacks are described in the linked paper.

KAISER: hiding the kernel from user space

hansendc — Wed, 15 Nov 2017 15:39:54 +0000

Interrupts are already a small number of thousands cycles, even for a quick one. The entry plus IRET costs alone probably eclipse the (new) CR3 manipulation cost. So, while this CR3 manipulation makes things worse, it does not fundamentally change the speed of an interrupt.

KAISER: hiding the kernel from user space

cborni — Wed, 15 Nov 2017 14:56:26 +0000

I think both SPARC (I am sure about Solaris, but not Linux) and s390 (here I am sure for Linux) have separate kernel and user address spaces.

KAISER: hiding the kernel from user space

epa — Wed, 15 Nov 2017 14:19:16 +0000

If these timing-based attacks involve accessing a page in the kernel address space and getting some kind of memory protection fault, can't the kernel add a small random delay each time such a fault is hit before control returns to user space? The delay could even increase with subsequent faults, imposing a ceiling on how many faults the process can generate. That is, provided there's a way to do this while not imposing that same delay on processes that are using these faults for other things, like memory-mapped files.

KAISER: hiding the kernel from user space

arjan — Wed, 15 Nov 2017 14:05:56 +0000

I think you might mean s390

KAISER: hiding the kernel from user space

ballombe — Wed, 15 Nov 2017 11:25:18 +0000

> Since the beginning, Linux has mapped the kernel's memory into the address space of every running process.

IIRC this is not true on SPARC, is it ?

KAISER: hiding the kernel from user space

sorokin — Wed, 15 Nov 2017 09:27:57 +0000

> Most workloads that we have run show single-digit regressions. 5% is a good round number for what is typical.

A single change may not affect performance significantly (although 5% slow down is too much for my taste). But multiple changes can stack up over time. In the past this has been seen both for performance improving and for performance regression changes. A performance regression example is how compilers' performance regressed overtime (although since GCC 6 the trend has reversed). A performance improving example is sqlite.

KAISER: hiding the kernel from user space

marcH — Wed, 15 Nov 2017 08:19:44 +0000

> a number of hardware-based attacks on KASLR. They use techniques like exploiting timing differences in fault handling,

I find timing-based attacks fascinating.

In order to grow, computer performance has become less and less deterministic. On one hand this makes real-time and predictions harder. On the other hand this leaks more and more information about the system.

KAISER: hiding the kernel from user space

jreiser — Wed, 15 Nov 2017 05:54:20 +0000

(CR3 manipulation) add a few hundred cycles to a syscall or interrupt. That's a couple L3 cache misses [CAS latency on SDRAM has been ~60ns for decades] which probably is tolerable on a syscall. But hundreds of cycles is horrible for an interrupt. [33MHz is a common bus clock, so just generating an interrupt already requires an average latency of ~15ns.] Some architectures have a special interrupt context (and/or separate small locked caches) exactly for this reason.