sys_membarrier()
As with the kernel version, user-space RCU works by deferring the cleanup of in-memory objects until it is known that no more references to those objects can exist. The implementation must be done differently, though, since user-space code is unable to run in the same atomic mode used by RCU in the kernel. So, in user space, a call to rcu_read_lock() sets a variable in shared memory indicating that the thread is in an RCU critical section. Within that critical section, it's safe for the thread to access RCU-protected variables.
...at least, it's safe as long as nobody reorders operations in a way that causes an access to happen to an RCU-protected variable before the effects of rcu_read_lock() are visible to other CPUs. That kind of reordering can indeed happen, at both the compiler and CPU levels, so it's a problem which must be addressed. Compile-time reordering is relatively easy to avoid, but runtime reordering in the CPU requires the issuing of a memory barrier instruction. And, indeed, user-space RCU can be made to work by putting memory barriers into the rcu_read_lock() call.
The problem with that solution is that memory barriers slow things down significantly. Even worse, they slow down the fast path for a case - a change to an RCU-protected variable - which happens rarely. So Mathieu would like to get rid of that barrier. To that end, he coded up a solution which sends a signal to every thread when an RCU-protected variable is about to be changed, forcing each thread to execute a memory barrier at that time. This solution does speed things up, believe it or not, but signals are almost never the optimal solution to any problem. Mathieu would like to do something better.
His "something better" turned out to be a simple system call:
void membarrier();
The initial implementation would simply send an inter-processor interrupt to every CPU in the system; the receiving CPUs would respond by executing an explicit memory barrier instruction. The solution worked, but it ran into a couple of objections in review:
- By allowing a user-space program to force interrupts to all processors
on the system, membarrier() presented an easy way to create
denial-of-service attacks on the system.
- The system call interrupted every processor on the system. Interrupting processors running different applications is a small but useless waste. The problem gets a little worse if some of those CPUs are running realtime tasks, which, presumably, would not welcome the forced addition of a bit of latency into their world. It would even interrupt processors which were currently sleeping - a useless exercise which would also increase energy use.
What followed was a long discussion on how to optimize the patch, whether an explicit memory barrier is needed even after the CPU has taken an inter-processor interrupt (the safe answer appears to be "yes"), and so on. All told, an impressive amount of effort has gone into the optimization of a small patch which is, at its core, implementing the slow path which should be rarely executed.
Current status, as of this writing, is that Mathieu has posted a new version of the patch with a number of changes. The first of those is the addition of an integer "expedited" parameter. If this value is zero, the system call simply calls synchronize_sched() and returns; this is the cheapest way of getting the needed functionality, but it comes at the cost of a latency of some milliseconds for the caller. It seems clear that Mathieu expects the "expedited" mode to be used most of the time.
For an expedited barrier, the system call will look at every CPU in the
system, building a mask of those which are running in the same address
space as the caller; those CPUs will then receive the inter-processor
interrupt asking them to execute a memory barrier instruction. It's a
rather more complicated implementation, but, since it only interrupts
processors which are running the calling application, the denial of
service, performance, and energy use concerns are no longer relevant. One
assumes that the patch is getting close to its final form, but it's hard to
say for sure: sometimes it's the smallest and simplest patches which are
scrutinized the most.
Index entries for this article | |
---|---|
Kernel | membarrier() |
Kernel | Read-copy-update |
Posted Jan 18, 2010 12:48 UTC (Mon)
by ikm (guest, #493)
[Link] (1 responses)
Posted Jan 20, 2010 23:26 UTC (Wed)
by compudj (subscriber, #43335)
[Link]
Rephrasing what you said: "the idea is to make CPUs execute memory barriers only when synchronization is required by the updater thread, as opposed to executing them each time before and after accessing it from a reader thread".
Thanks,
Mathieu
sys_membarrier()
sys_membarrier()