|
|
Subscribe / Log in / New account

The first half of the 6.3 merge window

By Jonathan Corbet
February 23, 2023
As of this writing, 5,776 non-merge changesets have been pulled into the mainline kernel for the 6.3 release; that is a bit less than half of the work that was waiting in linux-next before the merge window opened. This merge window is thus well underway, but far from complete. Quite a bit of significant work has been pulled so far; read on to see what entered the kernel in the first half of the 6.3 merge window.

Changes merged to date include:

Architecture-specific

  • A large set of old and unused Arm board files has been removed, reducing the size of the kernel tree by over 150,000 lines. This (6.0) commit describes the list of systems for which board files have been removed. Meanwhile, devicetree files have been added to support 46 new arm64 systems.
  • The new virtconfig build target for arm64 systems creates a relatively lightweight configuration intended to be booted on virtual systems.
  • AMD's "automatic IBRS" feature is now supported. This is a Spectre defense that restricts indirect-branch speculation with less of a performance cost than that imposed by retpolines.
  • The m68k architecture has gained support for system-call filtering with seccomp().
  • Arm scalable matrix extension 2 instructions are now supported.
  • BPF trampolines are now fully supported on s390x and RISC-V RV64 systems.

Core kernel

  • The list of enhancements to the kernel's embryonic support for the Rust language is relatively small this time, but that support is, according to Miguel Ojeda, "getting closer to a point where the first Rust modules can be upstreamed". These changes include the removal of a non-applicable part of the alloc crate, an implementation of the Arc type (which provides a reference-counted pointer), the ScopeGuard type (which runs some cleanup code when it goes out of scope), and the ForeignOwnable type, which facilitates moving pointers between Rust and C code.
  • There is a new document covering the stability expectations for BPF kfuncs; it describes the current status in the ongoing discussion of how stable the BPF API should be.
  • The cgroup.memory=nobpf command-line parameter disables memory accounting for BPF programs; see this merge message for a discussion of the motivation behind this feature.
  • There is a new red-black tree data structure available to BPF programs. See this merge message for more information.
  • The restartable sequences mechanism now exports a "per-memory-map concurrency ID" to processes. This ID can be thought of (and treated like) a CPU number, but the numbers are kept as close to zero as possible. Its purpose is to enable more efficient per-CPU data structures in applications that are only using a subset of the CPUs on a large system. This commit contains some more information.

Filesystems and block I/O

  • The tmpfs filesystem now supports ID-mapped mounts.
  • Erofs has gained support for per-CPU file-data decompression, leading to reduced data-access latency.
  • The Btrfs block allocator will now segregate extents by their size, so that any given block group is limited to extents that are small (less than 128KB), medium (up to 8MB), or large. This evidently reduces fragmentation, especially in workloads where allocation size correlates with file lifetime — something that evidently actually happens. See this commit message for some details.
  • Rotating disk drives still exist, and are even becoming more complex: multi-actuator drives have independently controllable arms that, for best performance, must all be kept busy. The BFQ I/O scheduler has gained support for such drives; this commit message has a bit more information on how it works.

Hardware support

  • GPIO and pin control: Qualcomm QDU1000/QRU1000, IPQ5332, SA8775P, and SM8550 pin controllers, Mediatek MT7981 pin controllers, and StarFive JH7110 pin and GPIO controllers.
  • Hardware monitoring: MPS MPQ7932 regulators, HPE GXP fan controllers, NXP MC34VR500 power-management ICs, and Infineon TDA38640 voltage regulators.
  • Input: EVision keyboards and Steam Deck force feedback controllers.
  • Miscellaneous: Xilinx ZynqMP on-chip-memory controllers, MediaTek low-voltage thermal sensor controllers, Intel topology aware register/pm capsule interfaces, Aspeed ACRY RSA engines, StarFive JH7110 random number generators, Maxim MAX20411 single step-down converters, and Broadcom BCMBCA HS SPI controllers.
  • Networking: Microchip KSZ9563/LAN937x Ethernet switch PTP clocks, Realtek RTL8188EU wireless interfaces, Ocelot VSC7511, VSC7512, VSC7513 and VSC7514 external switches, Amlogic GXL-based MDIO bus multiplexers, Motorcomm 8531 PHYs, and Qualcomm WiFi 7 (ath12k) interfaces.
  • Sound: MediaTek MT8188 controllers, Iron Device SMA1303 audio amplifiers, Renesas IDT821034 quad PCM codecs, Awinic AW88395 audio amplifiers, Realtek RT712 SDCA codecs, and Infineon PEB2466 quad PCM codecs.
  • Also: preliminary support for writing human-interface device drivers in BPF has been merged, though the mechanism for distributing such drivers is still to be worked out. See this document for more information.

Networking

  • Support for the Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer has been added; it is said to improve access performance on shared media Ethernet. This documentation patch describes how to configure and use this feature.
  • The "wireless extensions" API for the control of WiFi interfaces ran into trouble in 2006, but is still supported as an emulation layer. This API will no longer be supported for WiFi 7 (802.11be) interfaces, since it is unable to configure all of the available features. The use of the wireless extensions API will generate a warning for most current devices as of 6.3.
  • The process of documenting the netlink API continues; the results can be seen in the core API and user-space API manuals. Also added is a new tool to generate netlink protocol code from YAML specifications.
  • The new IP_LOCAL_PORT_RANGE socket option makes it easier for multiple hosts to make outgoing connections through a NAT gateway; this commit contains details.
  • Multi-path TCP can now handle mixed flows using both the IPv4 and IPv6 protocols.
  • BIG TCP support has been extended to IPv4.
  • The new default_rps_mask sysctl knob allows the creation of a default, per-net-namespace receive packet steering (RPS) configuration.
  • Support for a number of queuing disciplines (specifically class-based queuing (CBQ), ATM virtual circuits (ATM), differentiated service marker (dsmark), traffic-control index (tcindex), and resource reservation protocol (RSVP)) has been removed due to a lack of maintenance and interest.

Internal kernel changes

  • The old memory-allocation function get_kernel_pages() has been removed now that there are no more in-tree users.

The 6.3 merge window can be expected to remain open until March 5, at which point 6.3-rc1 will come out and the kernel will enter the stabilization phase of the development cycle. Quite a few more changes are poised to enter the mainline before that happens, though; tune in once the merge window closes for a summary of the rest of that work.

Index entries for this article
KernelReleases/6.3


to post comments

The first half of the 6.3 merge window

Posted Feb 23, 2023 20:37 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (9 responses)

> There is a new red-black tree data structure available to BPF programs. See this merge message for more information.

Linked lists, trees, custom data types... Guys, stop reinventing WASM.

The first half of the 6.3 merge window

Posted Feb 24, 2023 1:41 UTC (Fri) by davemarchevsky (guest, #85534) [Link] (7 responses)

Could you elaborate re: "reinventing WASM" as it relates to this work? The crux of that work was teaching the BPF verifier to understand natural-looking linked_list and rbtree usage. I'm not familiar with WASM so comparison isn't obvious to me.

The first half of the 6.3 merge window

Posted Feb 24, 2023 11:42 UTC (Fri) by smurf (subscriber, #17840) [Link] (6 responses)

His point is (as I understand it) that instead of adding more and more fancy stuff to BPF until is is feature equivalent to WASM, which so far has been a *lot* of work if you look at the history of the kernel's BPF code, we could just … well … link a WASM compiler into the kernel.

Reminds me of when I, in Linux's early days, was so fed up with the then-abysmal state of Linux networking that I linked the BSD network stack into it. It was not a particularly good fit, of course, but it worked.

The first half of the 6.3 merge window

Posted Feb 25, 2023 21:39 UTC (Sat) by Sesse (subscriber, #53779) [Link] (5 responses)

Is there a WASM verifier that can prove that a significant, useful class of WASM programs don't halt? (That's much of the appeal of the BPF verifier, after all.)

The first half of the 6.3 merge window

Posted Feb 26, 2023 2:03 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

WASM runtimes allow limiting the runtime (or the number of instructions) for WASM programs. It has the same effect in practice as the BPF.

And I'm pretty sure it's very easy to make BPF programs run for quite some time, if you combine list lookups and function calls. The raw instruction count has stopped being a good predictor for the maximum BPF runtime.

The first half of the 6.3 merge window

Posted Feb 26, 2023 6:20 UTC (Sun) by Sesse (subscriber, #53779) [Link] (3 responses)

If you just stop a program in the middle, you've got a problem, though; you could leave the kernel in a badly inconsistent state. If you wrote a WASM scheduler, and it got terminated due to timeout, what process would you schedule? What if it held a lock?

The point isn't as much to avoid slowness as to have deterministic forward progress in the kernel.

The first half of the 6.3 merge window

Posted Feb 26, 2023 19:33 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> If you just stop a program in the middle, you've got a problem, though; you could leave the kernel in a badly inconsistent state. If you wrote a WASM scheduler, and it got terminated due to timeout, what process would you schedule? What if it held a lock?

BPF doesn't guarantee this either. It has an early exit instruction (bpf_exit) that allows you to terminate the program earlier. It's entirely possible to take a lock and then do an early exit. Or to take two locks in the wrong order resulting in a deadlock, and the verifier will be happy. The only locking BPF allows is holding ONE spinlock at a time from a structure in a BPF map: https://lwn.net/Articles/779120/

A similar lock helper can be created for WASM via a simple helper that will release the lock on timeout.

With the scheduler example, BPF doesn't provide anything that can't be expressed in WASM. You can't express the invariant "BPF picks at least one process" in a way that the verifier understands.

The first half of the 6.3 merge window

Posted Feb 27, 2023 13:19 UTC (Mon) by kkdwivedi (subscriber, #130744) [Link] (1 responses)

> BPF doesn't guarantee this either. It has an early exit instruction (bpf_exit) that allows you to terminate the program earlier. It's entirely possible to take a lock and then do an early exit. Or to take two locks in the wrong order resulting in a deadlock, and the verifier will be happy. The only locking BPF allows is holding ONE spinlock at a time from a structure in a BPF map: https://lwn.net/Articles/779120/

The verifier does complain if you try to exit while holding a spinlock. Also, it's totally possible to support holding more than one lock at a time. Deadlock avoidance is a challenge, but there are some cases (which have a substantial overlap with common usage scenarios) where you can easily prove or enforce it statically. I think it has not been done yet because no strong use case came up, rather than some kind of fundamental limitation in BPF.

The first half of the 6.3 merge window

Posted Feb 27, 2023 20:46 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> Also, it's totally possible to support holding more than one lock at a time.

The last time I checked, the verifier supported only one lock at a time.

> Deadlock avoidance is a challenge

I don't think it's even possible if BPF is allowed to use general-purpose locks that are used in other parts of the kernel. For the more restricted use-case, it's possible to force lock ordering. But this will require runtime tracking to be useful, you can't have static verification for anything non-trivial.

The simplest way to do runtime tracking is to have a consistent numbering for locks, and when you take a lock, store the "lock tickets" in a linked list. This way you can verify that your previous lock has a greater number than the current one. It still will be somewhat limited (so no hand-over-hand locking), but it'll do for a large number of practical applications.

But this of course can be expressed as a simple API exposed to WASM code, just as with the BPF use case.

The first half of the 6.3 merge window

Posted Feb 24, 2023 7:32 UTC (Fri) by PengZheng (subscriber, #108006) [Link]

How to not reinvent (say, reuse) WASM in this context?

RTL8188EU

Posted Feb 24, 2023 10:11 UTC (Fri) by georgm (subscriber, #19574) [Link]

Now that rtl8xxxu supports rtl8188eu chips, is there still a case for the r8188eu driver in staging?

Shared-media Ethernet is still a thing?

Posted Mar 2, 2023 14:52 UTC (Thu) by kpfleming (subscriber, #23250) [Link] (3 responses)

I'm tempted to read the docs for this PLCA thing just to find out what sort of hardware is being deployed in 2023 which supports shared-media networking.

Shared-media Ethernet is still a thing?

Posted Mar 4, 2023 18:49 UTC (Sat) by geofft (subscriber, #59789) [Link] (2 responses)

This blog post says the use case is "automotive Ethernet," where apparently you don't want to run a whole new Ethernet cable to a new switch port to add some new device inside a car, you just want to attach the device to the cable. Seems like this is the core feature of "10BASE-T1S."

Shared-media Ethernet is still a thing?

Posted Mar 4, 2023 20:28 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

I wonder when we're going to get back to vampiric taps and thicknet...

Shared-media Ethernet is still a thing?

Posted Mar 6, 2023 10:19 UTC (Mon) by farnz (subscriber, #17727) [Link]

It's basically a way to let you reuse a CANbus topology for Ethernet, so that you can entirely replace CANbus with Ethernet in your vehicle without paying a weight penalty.


Copyright © 2023, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds