5.4 Merge window, part 2
Changes merged in the second half of the merge window include:
Architecture-specific
- The PowerPC architecture has gained support for an "ultravisor", which is an especially privileged layer of software charged with keeping the hypervisor in line. See this document for details.
Core kernel
- There is a new operation, IORING_OP_TIMEOUT, that can be requested from the io_uring subsystem. It will cause the calling process to be woken after the specified timeout period; see this commit for details.
Filesystems and block layer
- The dm-verity subsystem can now validate the root hash of a volume using a trusted key in the kernel keyring.
- The new dm-clone target makes a copy of an existing read-only device.
"
The main use case of dm-clone is to clone a potentially remote, high-latency, read-only, archival-type block device into a writable, fast, primary-type device for fast, low-latency I/O
". More information can be found in this commit. - The F2FS filesystem has gained support for case-independent file-name lookups. See this commit for some details.
- The new "virtiofs" filesystem allows a host to export filesystems efficiently to guest systems. See this document and this commit message for more information.
- It's not in 5.4 but worth a mention anyway: Samsung has decided to upstream its internal "sdfat" filesystem; this is a newer implementation of exFAT that, it is said, has fewer code-quality problems and more features. So the exFAT implementation added to the staging tree earlier in the merge window probably has a short life expectancy, at least in its current form.
Hardware support
- Clock: Marvell Armada AP CPU clock controllers, MediaTek MT6779 clock controllers, Ingenic JZ47xx TCU clocks and interrupt controllers, and Amlogic Meson virtual realtime clocks.
- Miscellaneous: Freescale FlexTimer alarm timers, Macronix raw NAND controllers, Creative SB0540 infrared receivers, Intel Merrifield Basin Cove power-management ICs, NXP IMX7ULP watchdog timers, and Spreadtrum pulse-width modulators.
- PCI: Amazon Annapurna Labs PCIe controllers and NVIDIA Tegra194 PCIe controllers.
Memory management
- It is now possible to use transparent huge pages for read-only file-mapped virtual memory areas. In practice, for now, this feature only works with executable text sections; an madvise() call is required to turn it on. See this commit for a bit of detail.
- There are two new madvise() commands to force the kernel to reclaim specific pages. MADV_COLD moves the indicated pages to the inactive list, essentially marking them unused and suitable targets for page reclaim. A stronger variant is MADV_PAGEOUT, which causes the pages to be reclaimed immediately.
- When we last looked at this memory-management performance-regression problem, there was pressure to revert a change reverting a performance-related patch. That revert was reverted for 5.3-rc5; now the revert of the revert has been reverted for 5.4. So the original revert is now in place, and a couple of different patches addressing the original problem have been merged. See this changelog for some more information, along with Linus Torvalds's reasoning for bypassing the memory-management developers and applying these patches directly.
Security-related
- The integrity-measurement (IMA) subsystem has gained support for verifying signatures appended to files. It has not, however, gained much in the way of documentation for this feature; what is available can be found in this commit.
- After years of work and controversy, the kernel lockdown patch set has been merged in the form of a Linux security module.
- In a last-minute move that, seemingly, is responsible for the one-day
delay in the release of 5.4-rc1, Torvalds decided
to merge an
entropy-collection mechanism for random-number generation based on
the "jitter entropy" idea. The
purpose here is to address the boot-time
entropy issues that can cause a system to hang during boot in some
situations. This may not be the ultimate form of the solution:
I'm not saying my patch is going to be the last word on the issue. I'm _personally_ ok with it and believe it's not crazy, and if it then makes serious people go "Eww" and send some improvements to it, then it has served its purpose.
Torvalds was clear, though, that he wants to see some sort of solution to the boot-time entropy problem in 5.4.
Internal kernel changes
- The build system will now refuse to proceed if the gold linker is detected. There are a few problems that make gold unsuitable for kernel building; see this commit for details.
- Support for kernel symbol namespaces has been added, providing a way to bring some order to the many thousands of exported symbols.
- The checkpatch.pl tool will now warn about invalid commit IDs in changelogs.
The development community will now focus on stabilizing this work over the
next 7-8 weeks, leading to an expected 5.4 release in the second half of
November.
Index entries for this article | |
---|---|
Kernel | Releases/5.4 |
Posted Oct 1, 2019 9:02 UTC (Tue)
by meyert (subscriber, #32097)
[Link] (7 responses)
Posted Oct 1, 2019 9:11 UTC (Tue)
by gevaerts (subscriber, #21521)
[Link] (1 responses)
Posted Oct 1, 2019 11:10 UTC (Tue)
by smurf (subscriber, #17840)
[Link]
Posted Oct 1, 2019 10:48 UTC (Tue)
by tkreagan (subscriber, #4548)
[Link] (1 responses)
Posted Oct 4, 2019 21:34 UTC (Fri)
by mtaht (subscriber, #11087)
[Link]
I've been a lonely advocate of a rethink of how we do cpu architectures for a long time, and have called for more hardware support of features essential
Posted Oct 1, 2019 12:55 UTC (Tue)
by ncultra (✭ supporter ✭, #121511)
[Link] (2 responses)
The "ultravisor" inherits (receives, accepts?) a virtual machine from KVM. At that point, if KVM (and therefore Linux and QEMU) is untrusted, this is "shutting the barn door after all the animals have escaped." I don't doubt that the "ultravisor" would be able to monitor the hypercalls made by the compromised virtual machine from that point onward, encrypt and decrypt its virtual storage, etc., but to what effect, if the guest has already been compromised?
All I/O made by a "secure" virtual machine is virtual I/O. virtio through QEMU is has a history of vulnerabilities, and involves QEMU having shared mappings with the virtual machine. This is problematic. It would be more secure to pass physical device functions directly to the virtual machine and to NOT allow virtual I/O from the secure virtual machine.
It seems as though this is a rube-goldberg-like fix for shared processor and memory side-channel attacks dressed up like a new feature.
Posted Oct 3, 2019 2:19 UTC (Thu)
by roc (subscriber, #30627)
[Link] (1 responses)
Posted Oct 4, 2019 8:23 UTC (Fri)
by linuxram (guest, #22157)
[Link]
Here are some presentations that will explain the architecture better.
https://www.youtube.com/watch?v=pKh_mPPo9X4
5.4 Merge window, part 2
5.4 Merge window, part 2
5.4 Merge window, part 2
5.4 Merge window, part 2
5.4 Merge window, part 2
to faster context and privilege switching along the lines of what the mill
proposed ( https://millcomputing.com/docs/security/ )
5.4 Merge window, part 2
5.4 Merge window, part 2
5.4 Merge window, part 2
https://static.sched.com/hosted_files/openpowerna19/45/Op...
https://www.youtube.com/watch?v=l4jccqc14Vc
https://static.sched.com/hosted_files/kvmforum2018/57/SVM...