Network filtering for control groups
The objective is to be able to apply a filter to network traffic going to or from any process contained within a given cgroup. The intent may be to improve security, by restricting the traffic that a particular system service or application (contained within its own cgroup) can generate. Or it could be a desire for simple resource control or accounting. Either way, the point is to have this control at the cgroup level, something that the kernel does not support now.
One possible solution, posted by Daniel Mack, is to allow a BPF program to be attached to a cgroup. To that end, the bpf() system call is extended with a new BPF_PROG_ATTACH operation. Exactly what the program is attached to depends on the type of the program; for now the only type supported is BPF_PROG_TYPE_CGROUP_SOCKET_FILTER, but the possibility exists that other types (to make other sorts of policy decisions for cgroups) could be supported in the future. Programs may be attached as either an ingress or an egress filter, controlled by a flag passed to the bpf() call. Naturally, there is also a BPF_PROG_DETACH operation to remove a BPF program from a cgroup.
Once the program is attached, it will be run on each packet sent to or from a process in the cgroup, depending on how it was attached — though only the ingress side is implemented in the current patch set. If the program returns one, the packet will be allowed to pass; otherwise it will be dropped.
The idea is thus relatively straightforward; it is similar to the socket filters that an individual process can apply to a socket it owns now. Cgroup maintainer Tejun Heo had some quibbles with the implementation, but had no real objection to the overall design. It seems like something that could be added without a whole lot of trouble — except that one developer has different ideas.
That developer is Pablo Neira Ayuso, the maintainer of the netfilter subsystem. Perhaps unsurprisingly, he thinks that the proper solution is based on netfilter rather than BPF; in particular, he would like to see the establishment of a special table of rules that could be attached to a cgroup. In his opinion, a set of rules that can be queried with existing tools would be easier for administrators to deal with than a relatively opaque BPF program. Multiple sets of netfilter rules can be composed, while the BPF approach only allows for a single program to be attached to a cgroup, limiting flexibility in situations where more than one entity wants to add filtering rules. A netfilter-based approach could also take advantage of the connection tracking that, likely, is already being done, speeding the processing of most packets. Those reasons, he says, make netfilter the better tool for this particular job.
Daniel acknowledged the downsides of the BPF implementation, though he was less convinced about the importance of some of them. It seems that this project was looking at a netfilter-based solution early on, but chose to refocus on BPF. There were concerns that the netfilter developers did not actually want a cgroup-level hook, and that the performance of the netfilter system might not be up to the task. He summarized things this way:
Even so, he said, he would be willing to look again at a solution based on netfilter, especially if Pablo were willing to help with the implementation — something that Pablo said he could do. BPF developer Alexei Starovoitov was rather less impressed, suggesting that a netfilter-based solution should be considered as a separate facility in the future, if a way can be found to implement it without slowing things down too much.
And that is where the discussion stands as of this writing. In a sense,
netfilter and BPF were always destined to come into conflict at some point;
both are, in essence, mechanisms for loading packet-filtering policy into
the kernel. Even if this particular disagreement is solved without undue
drama, this question is likely to come up again in other contexts. Thus
far, there seem to be few bounds on places where BPF may be applicable
but, perhaps, it still isn't the solution to every policy problem that
comes along.
Index entries for this article | |
---|---|
Kernel | BPF/Networking |
Kernel | Control groups |
Kernel | Networking/Packet filtering |
Posted Aug 25, 2016 11:58 UTC (Thu)
by smurf (subscriber, #17840)
[Link]
Posted Aug 25, 2016 16:49 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Aug 25, 2016 17:04 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link]
Isn't that what nftables was supposed to do?
I see that nftables uses its own VM rather than eBPF, but the main objection to just using eBPF seemed to be simply that with eBPF you can only replace the entire program, not individual rules. It appears to me that this could be handled by treating the nftables VM as an intermediate language and employing a user-mode helper program to compile the rules down to eBPF whenever they change.
The same mechanism would presumably integrate well with this new infrastructure to attach an eBPF filter to a control group.
Posted Oct 11, 2016 7:14 UTC (Tue)
by RamiRosen (guest, #37330)
[Link]
I want to add in this context that cgroups are used also for accounting for usage of resources, which is also an important part of their role.
Rami Rosen
Network filtering for control groups
Network filtering for control groups
Network filtering for control groups
Network filtering for control groups