5.15 Merge window, part 1
Architecture-specific
- The s390 architecture has gained support for the KFENCE and
KASANKCSAN development tools.
Core kernel
- It is now possible to place entire control groups into the SCHED_IDLE scheduling class — something that could only be done at the task level before. The group as a whole will only run when there is nothing else for the CPU to do, but tasks within the group will still have their relative weights.
- After something like 17 years of development effort, the realtime preemption locking code has been merged. This work began in 2004 and has fundamentally changed many parts of the core kernel. With this pull, the sleepable locks that make deterministic realtime response possible have finally joined all of that other work (though the kernel must be built with the REALTIME configuration option to use them). This merge log describes the major changes that this code brings.
- The io_uring subsystem now supports opening files directly into the fixed-file table without the use of a file descriptor. This can yield some significant performance improvements for certain types of workloads; it also is a significant break from the Unix tradition of using file descriptors for open files.
- Also new in io_uring is a new "BIO recycling" mechanism that cuts out some internal memory-management overhead; the result, it is claimed, is a 10% increase in the number of I/O operations per second that io_uring can sustain.
- Finally, io_uring has gained support for the mkdirat(), symlinkat(), and linkat() system calls.
- BPF programs can now request and respond to timer events. The timer API is severely undocumented; some terse information is available in this commit and this one, and there is a test program that contains an example.
- Core scheduler support for scheduling on asymmetric systems has been merged. There is another piece to make use of this functionality on Arm processors that is presumably coming later in the merge window.
Filesystems and block I/O
- The fanotify API has a new option, FAN_REPORT_PIDFD, which causes a pidfd to be returned as part of the event metadata. This (privileged) feature allows race-free identification of processes accessing monitored files.
- A set of hole-punching fixes should eliminate a class of subtle race conditions that could lead to file corruption.
- Support for mandatory file locking has been deprecated for years; it works poorly and is little used (if at all). As of 5.15, support for mandatory locking has been removed altogether.
- The LightNVM subsystem, which provided direct access to solid-state storage without an emulation layer, has been removed. According to the commit message, LightNVM has been superseded by newer NVMe standards and is no longer needed.
- The kernel finally has an in-kernel server for the SMB filesystem
protocol family. According to the merge
message:
ksmbd is a new kernel module which implements the server-side of the SMB3 protocol. The target is to provide optimized performance, GPLv2 SMB server, and better lease handling (distributed caching). The bigger goal is to add new features more rapidly (e.g. RDMA aka "smbdirect", and recent encryption and signing improvements to the protocol) which are easier to develop on a smaller, more tightly optimized kernel server than for example in Samba.
The Samba project is much broader in scope (tools, security services, LDAP, Active Directory Domain Controller, and a cross platform file server for a wider variety of purposes) but the user space file server portion of Samba has proved hard to optimize for some Linux workloads, including for smaller devices.
This is not meant to replace Samba, but rather be an extension to allow better optimizing for Linux, and will continue to integrate well with Samba user space tools and libraries where appropriate. Working with the Samba team we have already made sure that the configuration files and xattrs are in a compatible format between the kernel and user space server.
- The Btrfs filesystem has gained support for fs-verity file integrity assurance and ID-mapped mounts.
- The move_mount() system call (described in this article) has been extended to allow adding a mount to an existing sharing group. This relatively obscure new feature evidently solves a lot of problems for the Checkpoint/Restore in Userspace developers; see this commit for more information.
Hardware support
- Miscellaneous: Richtek RTQ6752 TFT LCD voltage regulators, Richtek RTQ2134 SubPMIC regulators, Rockchip serial flash controllers, Arm SMCCC random-number generators, and Aquacomputer D5 Next watercooling pumps.
- Networking: MediaTek Gigabit Ethernet PHYs, MHI WWAN MBIM interfaces, and LiteX Ethernet interfaces.
- Power supply: ChromeOS EC based peripheral chargers and Mediatek MT6360 chargers.
- Virtual: Virtio I2C adapters.
Networking
- The networking subsystem now has support for per-VLAN multicast. The truly determined can find some information about this feature in this merge changelog.
- The In-situ Operations, Administration, and Maintenance (IOAM) subsystem has gained support for the pre-allocated trace mechanism; see this merge commit for some more information.
- The Management Component Transport Protocol (MCTP) is now supported; see this documentation commit for details.
- Unix-domain sockets have gained support for out-of-band data.
Security-related
- There is a new prctl() operation called PR_SPEC_L1D_FLUSH. If a process turns this on, the kernel will flush the L1D (level-1 data) cache whenever that process is scheduled out of the CPU. This should help to mitigate a number of potential speculative-execution vulnerabilities that can cause data to be leaked from the L1D cache — at a significant performance cost. Note that this feature will not protect against a hostile process running on an SMT sibling processor; a feature like core scheduling must be used to protect against that case. The new prctl() can be disabled by the administrator; see this documentation patch for details.
- The device mapper has gained support for remote attestation using the kernel's integrity measurement architecture. See Documentation/admin-guide/device-mapper/dm-ima.rst for details.
The 5.15 merge window can be expected to stay open until September 12,
assuming that the usual schedule holds. LWN will be back with coverage of
the remainder of the merge window immediately after it closes; it seems
likely that there is quite a bit of work yet to be pulled for this
development cycle.
Index entries for this article | |
---|---|
Kernel | Releases/5.15 |