Next steps for kernel workflow improvement
The meeting was organized and led by Konstantin Ryabitsev, who is in charge of kernel.org (among other responsibilities) at the Linux Foundation (LF). Developing the kernel by emailing patches is suboptimal, he said, especially when it comes to dovetailing with continuous-integration (CI) processes, but it still works well for many kernel developers. Any new processes will have to coexist with the old, or they will not be adopted. There are, it seems, some resources at the LF that can be directed toward improving the kernel's development processes, especially if it is clear that this work is something that the community wants.
Attestation
Ryabitsev's first goal didn't feature strongly at the Maintainers Summit, but is an issue that he has been concerned about for some time: improving attestation for patches so that recipients can be sure of their provenance. Currently, there is no attestation at all, so recipients have to trust that patches come from the developer whose name appears on them. We all assume that maintainers are watching carefully and can catch spoofed emails, but the truth of the matter is that it is relatively easy to sneak malicious code past a maintainer. So an attacker could conceivably find a way to add a vulnerability to the kernel.
The first problem to solve is thus, according to Ryabitsev, to fix attestation. Linus Torvalds does verify the signed tags that are associated with pull requests, he said, so that part of the process is taken care of. But there are no signatures on actual patches, and no consensus on how they might be added.
His proposal is to introduce signatures on emailed patches as well. The mechanism used would be minisign, not GnuPG; one of the big advantages of minisign is that the attached signatures are much shorter than those created by GnuPG. Steve Rostedt interrupted at this point to question the value of this approach; he said that an attack, to be successful, would have to involve a relatively complex patch written in a style that mimics that of the purported author. It would be a big effort, he said; anybody with the resources to do that could also crack the encryption scheme used for attestation.
Ryabitsev responded, though, that minisign is "real cryptography" and not easy to crack; there are far easier ways to get bad code into the kernel than breaking the encryption. The hard part with this scheme, instead, is with identity tracking. GnuPG, like PGP before it, is based on the "web of trust" idea, but the web of trust has proved to be mostly unworkable over the years and people are giving up on it. Newer schemes tend to be based, like SSH, on a "trust on first use" (or TOFU) model, where a new key is trusted (and remembered) when it is first encountered, but changes in keys require close scrutiny. He suggested using a TOFU approach in an attestation mechanism for Git as well.
Rafael Wysocki was also skeptical, asserting that this scheme does not solve the problem; it only moves it elsewhere. An attacker could create an identity and build trust over time before submitting something malicious; the proposed scheme adds complexity but doesn't really fix anything, he said. Ryabitsev disagreed, though; building trust requires time and resources, but an attacker could spoof a trusted developer now.
Frank Rowand asked whether maintainers would be expected to strip signatures before committing patches. The signature, Ryabitsev answered, would go below the "‑‑‑" line in the changelog, so it would be automatically stripped at commit time. But the key used would also be noted in a local database and verified the next time a patch shows up from the same developer. Rostedt pointed out that one-time submitters would not have a key in this database; Ryabitsev replied that, since those developers are not coming back, it doesn't really matter. This scheme is about trusting ongoing developers.
He would like minisign-based attestation to become part of Git; tools like git format-patch would just add it automatically. Rowand pointed out that a lot of developers use relatively old versions of Git, so it would take years to roll this capability out to everybody. He said that GnuPG should be used instead; developers have it and the kernel's web of trust already exists. But Ryabitsev said that GnuPG is a poor tool for signing patches; the attached signature is often larger than the patch itself, and list archival mechanisms tend to strip it out. To be truly useful, signatures on patches need to be unobtrusive.
Like much of what was discussed in this meeting, signature use would be opt-in, at least initially. Ryabitsev is thinking about writing a bot that would watch the mailing lists and gently suggest to developers who are not using signatures that they might want to start. He asked the group whether this scheme as a whole was a good idea and got almost universal agreement (Rowand being the exception). So he intends to try to get the needed support added to Git.
Base-tree information
A common question asked of patch submitters is: "which tree was this made against?". That information is often needed to successfully apply a patch, and CI systems need it to be able to do automatic testing. But that "base-tree information" is not included with patches currently; fixing that is high on many developers' wish lists. Dmitry Vyukov asked whether it would be better to add this feature to Git and wait for it to be adopted, or to create a wrapper script that developers could use now. It turns out, though, that the ‑‑base option works in Git now, it's just a matter of getting submitters to use it. Vyukov agreed that this is the hardest part; he suggested creating a wrapper that would supply this option automatically.
There was a bit of a side discussion on whether Torvalds would complain about the base-tree information, as he does when tags like Change-id show up in patches. The problem, though, is not really the extra tag, it's the perceived uselessness of the information. If the base-tree information is useful, there should not be complaints.
It was pointed out that the base-tree information might not always be helpful to others; that base could be in a private tree, for example. At other times, though, it could be useful indeed. Rostedt pointed out that the "tip" tree used for x86 (and beyond) maintenance has a dozen or so branches in it; knowing which branch a patch applies to would be helpful. Everybody seemed to agree that this information should be there, and that the checkpatch.pl script should be made to check for it. There may eventually be a bot to nag developers who omit this information from their patches, but care would have to be taken to prevent it from generating too much noise.
Beyond email
For a number of reasons, requiring all kernel patches to be sent by email looks like a policy with a limited future. Switching to a "forge" service, along the lines of GitHub or GitLab, is an idea without universal appeal, though, especially in the short term. But there is desire for a solution that could let some developers move beyond email while maintaining the current workflow overall. The first step in that direction is likely to be some sort of Git-to-email bridge. Ryabitsev pointed out, though, that there is no consensus on what such a bridge might look like.
One option could be a special Git repository that developers could push to; any patch series pushed there would be turned into a series of emails and sent to the appropriate addresses. Ryabitsev does not like that idea, though; any such system would be a central point of failure that could go down at inopportune times. Another option would be some sort of web service that could be pointed at a public repository; once again, it would generate an email series and submit it. This solution falls down in another way, though: it is unable to support attestation. A third alternative is to create a command-line tool that can turn a pull request into an emailed series.
There are a number of hard problems to be solved here, he said, with many tradeoffs to be considered. But the easiest solution appears to be the command-line tool, perhaps integrated with an tool like GitGitGadget. There is also a tool under development at sourcehut that is worth a look. He might support such a tool by exposing an SMTP service specifically for mailing patches to kernel.org addresses.
That led to the concept of "feeds" — services that provide access to patches and more. The lore.kernel.org service has been running for a while now; it has quickly become an indispensable part of the kernel development process. Ryabitsev would, though, like to create something with similar functionality that does not need a mailing list behind it. Developers could use it to create their own patch feeds; CI systems could also export feeds describing the tests they have run and the results. Then it would be possible to, for example, automatically annotate patches with data on how they have been tested and by who. Bots could use this information to figure out which tests they should run, avoiding those that have already been run elsewhere. Feeds would be archived and mirrored so they could be checked years in the future. Feeds would be able to support attestation, record Acked-by tags, and more.
But that still leaves the problem of actually creating all of this tooling and making it easy to use. Nobody is going to want all of these feeds in their inbox, so it will have to be possible to establish filters. Size also matters: lore.kernel.org currently requires about 200GB of disk space, which is a bit unwieldy to download to one's laptop. But lore contains a lot of ancient history that developers will not normally need, so the database could be much smaller.
Ryabitsev is currently working with the maintainer of public-inbox on the development of some of these tools. There is, he said, some development time that is available at the LF over the next six months; what should he aim to achieve in that time? Building something with Docker would be convenient for many, but the "old-school developers" don't want to deal with Docker. Should it be a command-line or web-based tool? Fans of command-line tools tend to be more vocal, but that does not mean that they are a majority.
Perhaps, he said, the way to start would be to make it easy to set up a local Patchwork instance. There was a wandering discussion on how subsystems with group maintainership could be supported, but that is not a problem that can be solved in the next six months, he said. Further discussion on how the tools should be developed was deferred to the kernel workflows mailing list.
As time ran out there was some quick discussion of CI systems, including GitLab, Gerrit, and more. The kernel clearly needs more CI testing, so Ryabitsev wants to be sure that it is all integrated into any new tooling. He would like to be able to provide a feed describing what each of these systems is doing. These forge systems mostly provide an API for event data now; what is needed is a set of translator bots that could pull those events together into a public-inbox feed for anybody who is interested. CI systems would be able to consume this data, and others could follow it without having to have an account on each CI system.
The emails sent by CI systems now are just noise to many recipients, he said; as more of these systems come online that problem will get worse. Creating a feed solves the problem by putting CI reports where only the people who want them have to see them. It is a hard thing to do well, he said, and he is not sure how his solution will work, but he wants to try. Email is a terrible way to integrate with systems that need structured data, so he's looking to replace the email message bus with a more structured, feed-based system.
The session broke up with a statement that, if the community asks for this kind of tooling, there is a real possibility that the LF will find a way to fund its development.
See also: Han-Wen Nienhuys's notes from the meeting.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting your
editor's travel to the event.]
Index entries for this article | |
---|---|
Kernel | Development tools |
Conference | Open Source Summit Europe/2019 |
Posted Nov 1, 2019 17:06 UTC (Fri)
by tshow (subscriber, #6411)
[Link] (1 responses)
Posted Nov 1, 2019 17:16 UTC (Fri)
by jake (editor, #205)
[Link]
No, I'm pretty sure that was not what we meant to do :)
Fixed now, thanks!
jake
Posted Nov 1, 2019 17:44 UTC (Fri)
by q_q_p_p (guest, #131113)
[Link] (7 responses)
Posted Nov 1, 2019 19:20 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Nov 6, 2019 5:00 UTC (Wed)
by marcH (subscriber, #57642)
[Link]
This is one of the main reasons absolutely every code review solution (and any every "firehose" solution for that matter) relies on some sort of *database* featuring all the typical bells and whistles: [live] queries, cross-references, statistics, [CI] triggers, filters, notifications, authentication, etc.
The web vs CLI question is important but secondary, it's "just" the interface and many code review solutions offer both to some degree.
Now what is exciting here is allusions to some _distributed_ database model? Who knows, this could revolutionize code reviews like bitkeeper decentralized and revolutionized version control...?
Next: distributed bug tracking? OK, maybe that wouldn't be useful.
Posted Nov 1, 2019 20:11 UTC (Fri)
by logang (subscriber, #127618)
[Link] (3 responses)
My vague ideas for features in git would be:
* Support the entire flow for sending git patches inside git itself. This means branches need first class ways of storing cover letters, versions, recipient lists, etc. Instead of needing to do: format-patch, write cover letter, figure out send lists, notice a mistake, format patch, copy over cover-letter, send. It would nice if git just stored all this with the branch and all you needed to do was 'git send' when it's all ready.
* Support for easily importing patchsets from a mailbox into branches, with the cover letter and recipient lists. (Obviously this will need to solve the base-tree information problem first, possibly by including public repos that already have the base commits with the patches).
* Support for reviewing a patchset inside git itself and having the review comments sent via email to everyone on the recipient list and author, etc.
* Support for branch queues: if people are now importing tons of branches into their repos from their mailboxes, then they need some way of organizing these branches and determining which need attention next
* If the above features start being used by a majoriy, maybe then git could start to allow different transports other than email. So imagine a .git_maintainers file that contains a mapping of email addresses to desired transport. If the recipient's address isn't in this file, it simply falls back to email. A new transport might simply be that instead of emailing the patches they get pushed to a specified branch queue in a world-writable git repo. Sadly, this likely means that git will need to support some spam mitigations too.
* After that, interested parties could probably write a github-like web service that just provides a new front end for git's existing features. Then maintainers that want this could set it up for themselves, or kernel.org could offer this for maintainers that want it.
Posted Nov 3, 2019 21:14 UTC (Sun)
by rodgerd (guest, #58896)
[Link] (2 responses)
It's a pity Fossil isn't better-known; they already seem to have solved a lot of these problems.
Posted Nov 4, 2019 14:17 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
For those curious, I care because we vendor sqlite which is based on using git repositories for patch tracking before we import the code and then enforce that all changes for the vendoring process are tracked in that repository. Inb4 "vendoring is bad": there's an option to use an existing sqlite, but…Windows.
[1]https://repo.or.cz/sqlite-export.git
Posted Nov 4, 2019 17:26 UTC (Mon)
by logang (subscriber, #127618)
[Link]
Frankly, I think the fossil model is not useful for most open source projects. They don't really have a convincing story for drive-by contribution nor scaling a community. And they pretty much state out right that it would not be suitable for the kernel development model:
>The Linux kernel has a far bigger developer community than that of SQLite: there are thousands and thousands of contributors to Linux, most of whom do not know each others names. These thousands are responsible for producing roughly 89⨉ more code than is in SQLite. (10.7 MLOC vs. 0.12 MLOC according to SLOCCount.) The Linux kernel and its development process were already uncommonly large back in 2005 when Git was designed, specifically to support the consequences of having such a large set of developers working on such a large code base.
>95% of the code in SQLite comes from just four programmers, and 64% of it is from the lead developer alone. The SQLite developers know each other well and interact daily. Fossil was designed for this development model.
Posted Nov 3, 2019 10:23 UTC (Sun)
by daniels (subscriber, #16193)
[Link]
Posted Nov 1, 2019 21:58 UTC (Fri)
by estansvik (subscriber, #127963)
[Link]
Posted Nov 1, 2019 23:35 UTC (Fri)
by jgg (subscriber, #55211)
[Link] (3 responses)
I wonder how practical an email impersonation attack is? Maybe we should start by strengthening DKIM checking in patchworks and related?
Posted Nov 3, 2019 10:21 UTC (Sun)
by daniels (subscriber, #16193)
[Link] (2 responses)
Posted Nov 3, 2019 12:58 UTC (Sun)
by pabs (subscriber, #43278)
[Link]
http://arc-spec.org/
Posted Nov 4, 2019 17:29 UTC (Mon)
by jgg (subscriber, #55211)
[Link]
Posted Nov 3, 2019 15:20 UTC (Sun)
by tdz (subscriber, #58733)
[Link]
I think that patchwork already is the answer. It just needs a lot more features and a better UI.
Posted Nov 5, 2019 14:04 UTC (Tue)
by waver12 (guest, #112812)
[Link]
Posted Nov 7, 2019 15:22 UTC (Thu)
by kpfleming (subscriber, #23250)
[Link]
Posted Nov 8, 2019 19:16 UTC (Fri)
by error27 (subscriber, #8346)
[Link] (2 responses)
So sending patches by email doesn't work right now. But the git commands to send a patch is pretty complicated.
$ git format-patch HEAD~
Probably someone should make an interactive helper script called ./scripts/send_patch <hash> which does it automatically. It could run checkpatch etc.
Posted Nov 12, 2019 10:10 UTC (Tue)
by geert (subscriber, #98403)
[Link] (1 responses)
Posted Nov 13, 2019 18:09 UTC (Wed)
by error27 (subscriber, #8346)
[Link]
I manually remove Greg, Kees and Colin if it's not something I think they care about. But those people are used to getting tons of mail so they don't mind either way.
Posted Nov 25, 2019 22:49 UTC (Mon)
by dkg (subscriber, #55359)
[Link]
if the goal is small signatures, GnuPG (or any other OpenPGP implementation) using modern cryptography (e.g. Curve25519) produces a signature object that is about 119 octets. If you ASCII-armor it, it's up to 228 octets. None of this is even close to a quarter of the standard MTU, and barely noticeable among (for example) the Received: e-mail headers that will get attached to the message while it's in transit.
If the goal is simplicity, then mandating a specific profile of OpenPGP is the way to go -- the existing tooling already offers cryptographic validation, just tell people that they need to use Curve25519, and not the larger RSA keys. Modern elliptic curves have been available in gpg in Debian for years now (in stable since the release of debian 9 ("stretch") back in June 2017.
But perhaps the bigger questions are: who is verifying these signatures, at what stage, where are they recording these verifications, what do they do if the verifications are missing, how do we use these verifications going forward?
None of these harder questions are affected by a proposal to switch from OpenPGP to minisign, as far as i can tell, and minisign adds an additional technical hurdle of deployment.
I should be clear: I'm a fan of moving to a cryptographically-strong attestation model. I'd love to see us as a community grapple with what that really means, and with what we want from it. I don't think that switching from OpenPGP to minisign addresses any of the deeper underlying issues, though, and it seems like it might be a distraction instead.
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
* And once there's a majority using this flow, adding structured data or tags from CI bots should be a bit easier because it's just a matter of changing the tooling everyone already uses.
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
GitLab at least has a comprehensive API which can be used to pull the feed of recent events, create/modify/etc merge requests and comments on them, and so on, from the client of your choice. There are standalone CLI clients and rich bindings for whichever language you care to use it from. That's true of most web-based services created in the last 5-10 years.
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
https://en.wikipedia.org/wiki/Authenticated_Received_Chain
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
Next steps for kernel workflow improvement
$ git send-email --cc-cmd='./scripts/get_maintainer.pl --norolestats 0001-my.patch' --cc [email protected] 0001-my.patch
Next steps for kernel workflow improvement
BTW, doesn't "git send-email" add "[email protected]" automatically, based on the "From:" in the patch? Or is this need a side effect of using "--cc-cmd"?
Next steps for kernel workflow improvement
It's not clear to me what specific gain Ryabitsev hopes to get from moving from OpenPGP signatures to minisign signatures.
Next steps for kernel workflow improvement
and so on…