The block I/O latency controller
Modern block devices are fast, especially when solid-state storage devices are in use. But some workloads can be even faster when it comes to the generation of block I/O requests. If a device fails to keep up, the length of the request queue(s) will increase, as will the time it takes for any specific request to complete. The slowdown is unwelcome in almost any setting, but the corresponding increase in latency can be especially problematic for latency-sensitive workloads.
The kernel has a block I/O controller now, but it has a number of shortcomings. It regulates bandwidth usage, not latency; that can be good in settings where users are being charged for higher bandwidth limits, but it is less useful for workloads where latency matters. If some groups do not use their full bandwidth allocations, a block I/O device may go idle even though other groups, which have hit their limits, have outstanding I/O requests. The block I/O controller also depends heavily on the CFQ I/O scheduler and loses functionality in its absence. It doesn't work at all with multiqueue block devices — the type of devices most likely to be in use in settings where the I/O controller is needed.
The I/O latency controller, written by Josef Bacik, addresses these problems by regulating latency (instead of bandwidth) at a relatively low level in the block layer. When it is in use, each control group directory has an io.latency file that can be used to set the parameters for that group. One writes a line to that file following this pattern:
major:minor target=target-time
Where major and minor identify the specific block device of interest, and target-time is the maximum latency that this group should experience (in milliseconds).
The controller tracks the actual latency seen by each group, using a relatively short (100ms) window. If a given group starts to miss its target, all other peer groups with larger targets are throttled to free up some bandwidth; the group with the tightest latency target is thus given the highest priority for access to the device. If all groups are meeting their targets, no throttling is done, so no bandwidth should go to waste if there is a need for it.
On its face, throttling block I/O seems like a straightforward task: if a process needs to be slowed down, simply don't dispatch as many of its requests to the device. But block I/O is a bit strange in that much of it is initiated outside of the context of the process that is ultimately responsible for its creation. One example is filesystem metadata I/O, which is generated by the filesystem itself at a time of its own choosing. Slowing down that I/O may interfere with the filesystem's ordering decisions and create locking problems — without slowing down the target process at all. I/O generated by swapping is another example; it is generated when the kernel needs to reclaim memory, which may not be when the process being swapped is actually running. Slowing down swap I/O will slow down the freeing of memory for other uses — not a particularly good idea when the system is short of memory.
Kernel developers who introduce that kind of behavior have a relatively high likelihood of needing to look for openings in the fast-food industry in the near future. So the latency controller does no such thing. It will delay I/O dispatch for I/O that is generated directly by a process running inside a control group that is to be throttled. So a process reading rapidly from a file may find that its reads start taking longer when throttling goes into effect, for example.
A different approach is needed for indirectly generated block I/O, though. In such cases, the latency controller will record the amount of needed delay in the control group itself. Whenever a process running within that group returns from a system call — a setting where it is known that no locks are held — that process will be put to sleep for a period to pay back some of that delay. The sleep period can be as long as 250ms in severe cases. If I/O traffic eases up and throttling is no longer necessary, any remaining delays will be forgotten.
In the patch introducing the controller itself, Bacik notes that using it results in a slightly higher number of requests per second (RPS) overall, and a significant reduction in variability of RPS rates over time. There is another interesting result, in that this controller can help to protect the system against runaway processes:
The throttling, seemingly, slows the allocating process enough to allow the OOM killer to do its job before the system runs completely out of memory.
This patch set has been through six revisions as of this writing, with some
significant changes in the implementation happening along the way. That
work appears to be coming to a close, though. It earned the elusive Quacked-at-by tag from Andrew Morton,
and block maintainer Jens Axboe has indicated
that it has been applied for the 4.19 development cycle. So the latency
for the delivery of the block I/O latency controller would appear to be
three or four months at this point.
Index entries for this article | |
---|---|
Kernel | Control groups/I/O bandwidth controllers |
Posted Jul 6, 2018 7:01 UTC (Fri)
by mjthayer (guest, #39183)
[Link] (2 responses)
https://bugzilla.kernel.org/show_bug.cgi?id=48841
but recently much worse. (Disclaimer: I wasn't joking with "I know longer even know". So perhaps I am sometimes really triggering some strange hangs. Pretty sure it isn't hardware, as it has happened on too many different systems.) Posted Jul 6, 2018 11:22 UTC (Fri)
by smurf (subscriber, #17840)
[Link]
Patches which fix seemingly-unrelated problems instead of introducing them are good.
Posted Jul 6, 2018 13:23 UTC (Fri)
by josefbacik (subscriber, #90083)
[Link] (3 responses)
Posted Jul 7, 2018 14:40 UTC (Sat)
by jhoblitt (subscriber, #77733)
[Link] (2 responses)
Posted Jul 7, 2018 15:11 UTC (Sat)
by corbet (editor, #1)
[Link]
Posted Jul 9, 2018 11:46 UTC (Mon)
by josefbacik (subscriber, #90083)
[Link]
Posted Jul 14, 2018 17:56 UTC (Sat)
by ljishen (guest, #119563)
[Link]
The block I/O latency controller
The block I/O latency controller
> to do its job before the system runs completely out of memory.
More, please. ;-)
The block I/O latency controller
The block I/O latency controller
OOMD isn't released anywhere, as far as I know. Johannes's pressure stall patches have been on my radar for a bit; I'll definitely write about those, but the current set is a bit old, so I've been waiting for a repost.
The block I/O latency controller
The block I/O latency controller
The block I/O latency controller