The second half of the 4.17 merge window
Core kernel
- The CLOCK_MONOTONIC and CLOCK_BOOTTIME clocks used
to differ only in that the latter is fast-forwarded after a
suspend-and-resume cycle. As of 4.17, CLOCK_MONOTONIC is
also moved
forward to reflect the time that the system spent suspended. As a
result, the two timers are now identical and have been unified within
the kernel. Among other things, that change eliminates a potentially
surprising behavior wherein the offset between the monotonic and
realtime clocks would change after a resume.
Thomas Gleixner noted:
"
There might be side effects in applications, which rely on the (unfortunately) well documented behaviour of the MONOTONIC clock, but the downsides of the existing behaviour are probably worse.
"If applications do break, this change may have to be reverted. Meanwhile, there is a new clock (CLOCK_MONOTONIC_ACTIVE) that only advances when the system is actually running.
- The new INOTIFY_IOC_SETNEXTWD ioctl() command allows inotify users to specify the number of the descriptor they would like to see returned for the next watch descriptor they create. This is used for checkpoint/restart.
- After a few years of waiting, the histogram trigger feature was added to the tracing subsystem. This mechanism enables the easy creation, in kernel space, of histograms from tracing data.
- The mmap() system call supports a new MAP_FIXED_NOREPLACE option. Like MAP_FIXED, it tries to place the new memory region at a user-supplied address. Unlike MAP_FIXED, though, it will not replace an existing mapping at that address; instead, it will fail with EEXIST if such a mapping exists. This is the change that was discussed last year as MAP_FIXED_SAFE; it seems that the battle over the proper name for the feature has finally been resolved.
Architecture-specific
- The ARM architecture has gained support for the "system control and management interface", or SCMI. It is a set of standards for system management and, in particular power management.
- 64-Bit PowerPC systems now have the ability to address up to 4PB of memory.
- Support for POWER4 processors was accidentally (they swear) broken in 2016, and nobody complained. So support for those processors has been removed entirely on the assumption that nobody is using them anymore.
Filesystems
- The overlayfs filesystem can, at times, present different inode numbers for the same file at different times, potentially confusing applications that use those numbers. The "xino" option added for 4.17 will store the filesystem ID in the upper part of the inode number, which allows it to present inode numbers that will not change over time. Some information can be found in Documentation/filesystems/overlayfs.txt.
Security-related
- The kernel now supports the Speck cipher, a block cipher that is said to outperform AES on systems without hardware AES support.
- AES encryption in Cipher Feedback Mode is now supported; this is required for TPM2 cryptography.
- The SM4 symmetric cipher algorithm is supported; it is "
an authorized cryptographic algorithm for use within China
" according to commit. - The SCTP protocol now has complete SELinux support; see Documentation/security/SELinux-sctp.rst for details.
- The AppArmor security module has gained basic support for the control of socket use. See this commit for a little bit of documentation.
Hardware support
- Audio: Texas Instruments PCM1789 codecs, AKM AK4458 and AK5558 codecs, Rohm BD28623 codecs, Motorola CPCAP codecs, Maxim MAX9759 speaker amplifiers, ST TDA7419 audio processors, and UniPhier AIO audio subsystems,
- Cryptographic: ARM TrustZone CryptoCell security processors and TI Keystone NETCP SA hardware random-number generators.
- Industrial I/O: Melexis MLX90632 infrared sensors, Analog Devices AD5272 digital potentiometers, On Semiconductor LV0104CS ambient light sensors, and Microchip MCP4017/18/19 digital potentiometers.
- USB: HiSilicon STB SoCs COMB PHYs, AMLogic Meson GXL and GXM USB3 PHYs, STMicroelectronics STM32 USB HS PHY controllers, HiSilicon INNO USB2 PHYs, Motorola Mapphone MDM6600 USB PHYs, Pericom PI3USB30532 Type-C cross switches, ELAN USB touchpads, and devices supporting USB class 3 audio.
- Miscellaneous: QCOM on-chip GENI based serial ports, MediaTek SoC gigabit Ethernet controllers, Raspberry Pi 3 GPIO expanders, Nintendo Wii GPIO controllers, Spreadtrum SC9860 platform GPIO controllers, RAVE SP power buttons, PhoenixRC flight controller adapters, HiSilicon hi3660 mailbox controllers, Socionext SynQuacer I2C controllers, Intersil ISL12026 realtime clocks, Nuvoton NPCM750 watchdog timers, Mediatek MT2701 audsys clocks, Allwinner H6 clock controllers, Silicon Labs 544 I2C clock generators, Synopsys DesignWare AXI DMA controllers, and MediaTek High-Speed DMA controllers.
Other
- The ABI for 32-bit RDMA users has changed in incompatible ways. The changes are justified with the claim that there are no actual users of the 32-bit mode now, but some may be coming in the future.
Internal kernel changes
- The way that system calls are invoked on the x86-64 architecture has been reworked to make it more uniform and flexible. The new scheme has also been designed to prevent unused (but caller-controlled) data from getting onto the call stack — where it could perhaps be used in a speculative-execution attack.
- The lexer and parser modules used by the kernel build process are now themselves built on the target system (requiring flex and bison) rather than being shipped in the kernel repository.
As expected, the final diffstat for this merge window shows that more lines of code were deleted than added — 191,000 more. This is only the third time in the kernel's history that a release has been smaller than its predecessor.
Also possibly worthy of note is that the final
SCSI pull pushed the kernel repository to over six-million objects.
Linus added: "I was joking around that that's when I should switch to
5.0, because 3.0 happened at the 2M mark, and 4.0 happened at 4M
objects. But probably not, even if numerology is about as good a reason as
any.
"
This kernel now enters the stabilization process, which will culminate
in the final 4.17 (or maybe 5.0?) release in early June.
Index entries for this article | |
---|---|
Kernel | Releases/4.17 |
Posted Apr 16, 2018 18:59 UTC (Mon)
by josh (subscriber, #17465)
[Link] (1 responses)
Posted Apr 17, 2018 17:47 UTC (Tue)
by k8to (guest, #15413)
[Link]
There may be more subtle problems, and I'd like to hear about them too. Expanding knowledge of errors in time code is kind of valuable because there are so many to make.
Posted Apr 16, 2018 22:48 UTC (Mon)
by glenn (subscriber, #102223)
[Link] (12 responses)
Also, are timers (i.e., timerfd()) against CLOCK_MONOTONIC_ACTIVE supported? If not, my code base may need a lot of rework...
Posted Apr 17, 2018 5:51 UTC (Tue)
by epa (subscriber, #39769)
[Link] (5 responses)
Posted Apr 17, 2018 9:05 UTC (Tue)
by tglx (subscriber, #31301)
[Link] (4 responses)
We have discussed that back and forth and finally decided to give it a try. If you or anyone else observes wreckage please let us know immediately.
Posted Apr 17, 2018 9:18 UTC (Tue)
by epa (subscriber, #39769)
[Link]
Posted Apr 19, 2018 12:47 UTC (Thu)
by lynxeye (subscriber, #90890)
[Link]
This is unexpected and I bet most of the graphics userspace will fall over if it hits such a condition.
Posted Apr 27, 2018 3:38 UTC (Fri)
by njs (guest, #40338)
[Link] (1 responses)
Did you keep the relationship between sleeping syscalls and CLOCK_MONOTONIC – so that e.g. a nanosleep() before suspend will now wake up immediately on resume? Or did you keep the old sleeping syscall semantics, and break the relationship with CLOCK_MONOTONIC?
As far as I know, all correct event loops currently depend on the assumption that sleeping syscalls and CLOCK_MONOTONIC match each other. For example, if I set a timeout for T seconds from now, the event loop will:
- use (clock_gettime(CLOCK_MONOTONIC) + T) to calculate the absolute time of the timeout
Right now that's sufficient to ensure that epoll_wait() will return when clock_gettime(CLOCK_MONOTONIC) == deadline, or thereabouts... but if CLOCK_MONOTONIC starts counting suspend time, while epoll_wait() doesn't, then we'll start sleeping too long and missing our deadlines by an arbitrary amount.
Or at least, that's what the event loop I maintain does, which is why I want to know :-).
(As an added bonus, if I *do* have to switch to CLOCK_MONOTONIC_ACTIVE, that's going to be a hassle. Currently the event loop is implemented in Python, and the Python standard library obviously doesn't yet have any bindings for CLOCK_MONOTONIC_ACTIVE. Given where we are in the release cycle, the earliest they could be added is 1.5-2 years from now. In the mean time I guess it becomes temporarily impossible to implement an event loop in Python on Linux; you have to write part of it in C, and that's a huge obstacle for distribution :-(.)
Posted Apr 27, 2018 3:52 UTC (Fri)
by njs (guest, #40338)
[Link]
On further investigation, it looks like it's not quite as bad as I thought – CLOCK_MONOTONIC_ACTIVE can be queried from Python with:
time.clock_gettime(12)
(Untested, since I don't have a kernel with CLOCK_MONOTONIC_ACTIVE support).
Posted Apr 17, 2018 17:52 UTC (Tue)
by k8to (guest, #15413)
[Link] (5 responses)
Are we worried that the time jumping forward may expire many timers at once causing programs to do work? That seems correct. It's fairly easy for programs with many expired timers to amortize the cost of doing the work those timers represent, and they probably need to have that logic in place anyway if they hope to self-regulate.
If you're instead worried about many different programs having expiring timers and fighting over resources, that seems like a problem that requires a co-ordinating facility. Grand Central Dispatch from Apple would be one approach. Of course, in a way, the operating system's basic task switching functions are another.
The other option would be some software that thinks it needs to do some work for every interval window, so that if 1000 intervals are passed, it insists on doing 1000 times the work. That behavior is either required (if for example, there's a requirement to look at each time interval's data sample), or is fundamentally broken. I'm not sure how this particular change really affects either of those two situations.
Am I missing something?
Posted Apr 17, 2018 19:59 UTC (Tue)
by glenn (subscriber, #102223)
[Link] (4 responses)
This is my concern. I've used CLOCK_MONOTONIC timers to trigger periodic tasks, such as transmit a heartbeat/health-status message, run a watchdog check, etc. Another use-case could be a timer that drives a game loop or animation. The logic surrounding these routines is simple because the (old) CLOCK_MONOTONIC is simple. The software built up around such timers might hide the underlying timer mechanisms (e.g., a timerfd file descriptor), so higher-level application-level software might be unable to reprogram the underlying timer (or cancel it).
Posted Apr 17, 2018 20:07 UTC (Tue)
by k8to (guest, #15413)
[Link] (3 responses)
Posted Apr 17, 2018 20:53 UTC (Tue)
by glenn (subscriber, #102223)
[Link] (2 responses)
For one-shot timers, I believe that you are correct. My concern is with periodic timers.
Consider the use case of timerfd with a 10Hz periodic timer on CLOCK_MONOTONIC. Your application logic invokes a callback for every increment of the timerfd counter. Before you suspend, the timerfd count is 0---you have no callbacks to execute. You wake from suspension after an hour. The timerfd counter has been fast-forwarded and has a backlogged count of 36,000. If your application logic is simple, you'll invoke your callback in a burst of 36k invocations as you burn the counter back down to zero.
Posted Apr 18, 2018 5:47 UTC (Wed)
by k8to (guest, #15413)
[Link] (1 responses)
> read(2)
If you get a read() of 36,000 and you execute your logic 36,000 times your program is just busted. Runaway could occur without this quirk.
Posted Apr 18, 2018 17:33 UTC (Wed)
by glenn (subscriber, #102223)
[Link]
That is a fair point. However, this kind of defensive programming was unnecessary under the old CLOCK_MONOTONIC contract. Moreover, if code needs to be updated to detect unexpected timer backlogs, the developer has to make a judgement call on how many backlogged timers are too many: It may not always be clear if a backlog is due to system suspension or if an application is simply unable to service its timers fast enough (either due to its own execution behaviors, or due to those of other processes inducing CPU starvation). Setting a timer against CLOCK_MONOTONIC_ACTIVE may be an easier countermeasure. In either case, userspace has to change.
Posted Apr 19, 2018 12:43 UTC (Thu)
by clugstj (subscriber, #4020)
[Link]
Posted Apr 26, 2018 9:03 UTC (Thu)
by sourcejedi (guest, #45153)
[Link]
https://bugzilla.redhat.com/show_bug.cgi?id=1524412
Posted Apr 26, 2018 9:39 UTC (Thu)
by tkhai (guest, #99286)
[Link]
This is not for checkpoint/restart, this is for checkpoint/restore :D
The second half of the 4.17 merge window
The second half of the 4.17 merge window
Possible side-effects of CLOCK_MONOTONIC change?
Possible side-effects of CLOCK_MONOTONIC change?
Possible side-effects of CLOCK_MONOTONIC change?
Possible side-effects of CLOCK_MONOTONIC change?
Possible side-effects of CLOCK_MONOTONIC change?
Possible side-effects of CLOCK_MONOTONIC change?
- later, when it calls epoll_wait(), it'll choose the timeout by doing (deadline - clock_gettime(CLOCK_MONOTONIC))
- then it passes that timeout to epoll_wait()
Possible side-effects of CLOCK_MONOTONIC change?
Possible side-effects of CLOCK_MONOTONIC change?
Possible side-effects of CLOCK_MONOTONIC change?
Possible side-effects of CLOCK_MONOTONIC change?
Possible side-effects of CLOCK_MONOTONIC change?
Possible side-effects of CLOCK_MONOTONIC change?
> If the timer has already expired one or more times since its
> settings were last modified using timerfd_settime(), or since
> the last successful read(2), then the buffer given to read(2)
> returns an unsigned 8-byte integer (uint64_t) containing the
> number of expirations that have occurred.
Possible side-effects of CLOCK_MONOTONIC change?
The second half of the 4.17 merge window
The second half of the 4.17 merge window
The second half of the 4.17 merge window