-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Okay, I've figured it out. This is really dumb. tl;dr: This is really an AppArmor bug (or even a design flaw if you prefer).
For context, the file we are trying to write to is
/proc/sys/net/ipv4/ip_unprivileged_port_start. @stgraber figured out that the problematic AppArmor rules are the rules they have which block writing to most/sysfiles. How is it possible that one affects the other?Well, the problem is that runc now uses a detached mount of
procfsto operate on (this avoids mount race attacks). Because detached mounts have not been attached to the filesystem,d_name(the kernel's facility for generating names for dentries) just generates a name that looks like/fooif you try to open a filefooinside the detachedprocfsmount. AFAICS this is what AppArmor uses to determine what file you are trying to write to (because AppArmor is path-based, andd_nameis the only way to get pathnames from dentries).This means that when we try to write to
/proc/sys/net/ipv4/ip_unprivileged_port_start, AppArmor sees this as us trying to write to/sys/net/ipv4/ip_unprivileged_port_startwhich is forbidden by the/sysdenial rules. I have attached a program that can show this behaviour using a detachedtmpfsmount, it's very trivial to trigger:% ./aa-bug & c1:~ # ./aa-bug & fd: /proc/2061/fd/5 [1] 2061 c1:~ # mkdir /proc/2061/fd/5/sys c1:~ # mkdir /proc/2061/fd/5/sys/foo mkdir: cannot create directory ‘/proc/2061/fd/5/sys/foo’: Permission deniedThere is a trivial workaround for this particular sysctl:
- deny /sys/[^fdck]*{,/**} wklx, + deny /sys/[^fdckn]*{,/**} wklx,(In
/etc/apparmor.d/abstractions/lxc/container-base.)But this doesn't help in the general case for all sysctls. @stgraber has just submitted lxc/incus#2624 which just removes these rules entirely. I think AppArmor should not do this, because it's incredibly broken (literally any detached mount could match against a rule by accident), but this is unfortunately how AppArmor's design works.
From runc's side, we could in theory use this to our advantage -- if we created a
tmpfswith a subpath like.go-away-apparmorand then attached our procfs mount to that path, we might be able to subvert AppArmor. However, this has a risk of causing lifetime issues that would require a rework of how we do lookups -- thetmpfsmust not be closed after we attach to it because it will lazy-unmount the procfs...