Grabbing file descriptors with pidfd_getfd()
One thing that is possible in current kernels is to open a file that another process also has open; the information needed to do that is in each process's /proc directory. That does not work, though, for file descriptors referring to pipes, sockets, or other objects that do not appear in the filesystem hierarchy. Just as importantly, though, opening a new file in this way creates a new entry in the file table; it is not the entry corresponding to the file descriptor in the process of interest.
That distinction matters if the objective is to modify that particular file descriptor. One use case mentioned in the patch series is using seccomp to intercept attempts to bind a socket to a privileged port. A privileged supervisor process could, if it so chose, grab the file descriptor for that socket from the target process and actually perform the bind — something the target process would not have the privilege to do on its own. Since the grabbed file descriptor is essentially identical to the original, the bind operation will be visible to the target process as well.
For the sufficiently determined, it is actually possible to extract a file descriptor from another process now. The technique involves using ptrace() to attach to that process, stop it from executing, inject some code that opens a connection to the supervisor process and sends the file descriptor via an SCM_RIGHTS datagram, then running that code. This solution might justly be said to be slightly lacking in elegance. It also requires stopping the target process, which is likely to be unwelcome.
This functionality, without the need to stop the target process, is relatively easy to implement in the kernel, though; a supervisor process would merely need to make a call to:
int pidfd_getfd(int pidfd, int targetfd, unsigned int flags);
The target process is specified by pidfd (which is, as one might expect, a pidfd, presumably obtained when the process was created). The file descriptor to grab is given by targetfd; if all goes well, the return value will be a local file-descriptor number corresponding to the target process's file. For all to go well, the calling process must have the ability to call ptrace() on the target process.
The flags argument is currently unused and must be zero. There are, evidently, plans to add flags in the future, though. One would cause the file descriptor to be closed in the target process after being copied to the caller, thus truly "stealing" the descriptor from the target. Another would remove any related control-group data from socket file descriptors during the copy operation.
This patch set has been through an impressive number of versions — and a fair amount of evolution — since it was first posted on December 5. The initial version added a new PTRACE_GETFD command to ptrace(). Version 3 switched to an ioctl() operation on a pidfd instead. In version 5, fifteen days after the initial posting, this functionality moved into a separate system call. The current posting is version 9.
From the beginning there has not been much concern about the goals behind
this feature; the comments have mostly focused on the implementation. At
this point, Dhillon would appear to have just about exhausted the set of
possible implementations — though some might be justified in thinking that
a BPF version in the near future is inevitable. Failing that, this new
system call may well be on track for the 5.6 or 5.7 merge window.
Index entries for this article | |
---|---|
Kernel | System calls |
Posted Jan 9, 2020 18:44 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (6 responses)
On first read, I found this rather confusing. Surely the sandboxed process would be able to open that AF_UNIX connection itself, right?
But no, because they're not talking about a sandboxed process that is cooperating with the supervisor. They're (I think) talking about a sandboxed process that is ignorant of its sandbox and thinks it can "just call bind(2)." In that case, you actually need to intercept that call and emulate it outside the sandbox, without the sandboxed process noticing.
What bothers me most, however, is that this still feels like an antiquated system design. In the great before-times, inetd would spawn your server with the socket already hooked to stdin, and you wouldn't need to think about calling bind() or indeed any part of the sockets interface. While there are obvious scalability concerns with that approach, I still believe that binding sockets (to well known ports) ought to be something that is handled by system infrastructure and not separately by each individual server.
Posted Jan 9, 2020 19:04 UTC (Thu)
by Karellen (subscriber, #67644)
[Link] (3 responses)
So, like sd_listen_fds()?
Posted Jan 10, 2020 13:39 UTC (Fri)
by miquels (guest, #59247)
[Link] (2 responses)
Posted Jan 10, 2020 14:29 UTC (Fri)
by Karellen (subscriber, #67644)
[Link] (1 responses)
Thanks for pointing to those!
However, I'd have reservations about using authbind - LD_PRELOAD is handy for debugging and trying weird tricks out, but I'm wary about using it in production systems.
innbind looks much cleaner, and certainly would allow you to write a program that could bind to privileged ports without needing to run as root, but as far as I can tell it allows any program on the system to bind privileged ports. If you installed it so that only members of a specific group were able to run it, and limited which programs ran as members of that group, that could work.
Posted Jan 10, 2020 15:14 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted Jan 11, 2020 23:12 UTC (Sat)
by rra (subscriber, #99804)
[Link] (1 responses)
Back when that was added to UNIX's security model, there were a wealth of programs that used the ability to bind to specific ports as an authorization control of various kinds (remember identd?). Most of those protocols are thoroughly obsolete (I hope no one is using traditional rlogin with rhosts authentication these days), so protecting those ports doesn't serve the same purpose.
I would argue that, today, the security concern is preventing programs from grabbing ports they're not "supposed" to have, but that problem is not limited to ports under 1024 except by history and convention. There are a lot of services that listen to ports above 1024 where some race condition allowing a user process to bind to that port is equally problematic.
It feels like a more useful security primitive now would be controlling the specific ports to which a process can bind, which looks more like socket activation (as you describe), or like a container where the process can bind to any port it wants but only expected ports are routed outside the container, so binding to other ports is futile.
Posted Jan 13, 2020 9:24 UTC (Mon)
by cortana (subscriber, #24596)
[Link]
Posted Jan 9, 2020 18:57 UTC (Thu)
by zblaxell (subscriber, #26385)
[Link] (2 responses)
That sounds messy--the FD could end up being used again by an open in
Why not do an atomic FD swap?
int stolen_fd = pidfd_swapfd(int pid_fd, int target_fd, int flags, int caller_fd)
Set caller_fd = NOFD if you really want the FD closed in the target process;
Set target_fd = NOFD to copy caller_fd to the target process, assigning
caller_fd isn't closed in the calling process--close() is fine for that.
Posted Jul 27, 2021 8:13 UTC (Tue)
by taladar (subscriber, #68407)
[Link] (1 responses)
Posted Jul 27, 2021 15:03 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link]
[1] In code, that would be "sanity checks on top of the language guarantees". IMO, it's just normal defensive coding and the amount you put in depends on how paranoid you tend (or need) to be.
Posted Jan 9, 2020 21:23 UTC (Thu)
by roc (subscriber, #30627)
[Link]
Of course it will be years before the new syscall is widely deployed enough that we can actually rip out our code, but ... progress.
Posted Jan 10, 2020 20:53 UTC (Fri)
by kylebot (guest, #134772)
[Link] (1 responses)
Posted Jan 10, 2020 22:30 UTC (Fri)
by cyphar (subscriber, #110703)
[Link]
Posted Jan 14, 2020 14:52 UTC (Tue)
by dona73110 (guest, #113155)
[Link] (1 responses)
You sure can open a pipe that another process has open, by opening /proc/PID/fd/FD ... open(2) opens the actual files that these symlinks represent, which in the case of deleted files or pipes, etc, do not correspond to the path in the symlink target returned by readlink.
Posted Jan 29, 2020 1:35 UTC (Wed)
by cyphar (subscriber, #110703)
[Link]
Grabbing file descriptors with pidfd_getfd()
>
> For the sufficiently determined, it is actually possible to extract a file descriptor from another process now. The technique involves using ptrace() to attach to that process, stop it from executing, inject some code that opens a connection to the supervisor process and sends the file descriptor via an SCM_RIGHTS datagram, then running that code. This solution might justly be said to be slightly lacking in elegance. It also requires stopping the target process, which is likely to be unwelcome.
Grabbing file descriptors with pidfd_getfd()
I still believe that binding sockets (to well known ports) ought to be something that is handled by system infrastructure and not separately by each individual server.
Or things like authbind and innbind ?
Grabbing file descriptors with pidfd_getfd()
Grabbing file descriptors with pidfd_getfd()
Grabbing file descriptors with pidfd_getfd()
Grabbing file descriptors with pidfd_getfd()
Grabbing file descriptors with pidfd_getfd()
Grabbing file descriptors with pidfd_getfd()
some other thread of the target process, causing hilarious confusion on
the target side if the target is not expecting FD thievery.
otherwise, the caller's caller_fd becomes the target's target_fd, while the
former target's target_fd is returned in stolen_fd.
a new FD as if the target process had performed an open(). The new FD
number in the target is returned in stolen_fd.
Grabbing file descriptors with pidfd_getfd()
Grabbing file descriptors with pidfd_getfd()
Grabbing file descriptors with pidfd_getfd()
https://github.com/mozilla/rr/blob/79eea40fe0d496abb6fcb0...
It's not nice, especially because we want it to work whether the tracee is 64-bit or 32-bit.
Grabbing file descriptors with pidfd_getfd()
Then what's the difference between these two methods.
Grabbing file descriptors with pidfd_getfd()
Grabbing file descriptors with pidfd_getfd()
Grabbing file descriptors with pidfd_getfd()