-
Notifications
You must be signed in to change notification settings - Fork 75
[WIP] storage: add splitfdstream for efficient layer transfer with reflink support #651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Giuseppe Scrivano <[email protected]>
4cca2a3 to
3b6f591
Compare
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 6% of the code base for an extremely niche use case :/
I’m afraid I can’t spend time on this now; and anyway some other work on pkg/archive needs to happen first, and expect the rebase to be somewhat non-trivial.
what other work would block this feature? I don't love either the amount of extra code, but it is almost all in new APIs/packages that are experimental and not leaking too much to the rest of the library. We went around this problem for a long time, and in the end this seems like an ~ok solution to not expose too much of the containers storage internals augmenting the tar format that is already plugged everywhere. |
3b6f591 to
26cd4b6
Compare
|
to add more context: the goal of this feature is to be able to copy a layer, similarly to what we would do with There are other interesting use cases enabled with a RPC like facilitate usage from different languages (Rust in our case), inject into a build container to grab R/O access to layers/images, but this is out of scope now. I realize complexity is not trivial just to get access to the raw files, but what alternatives do we have? From the store API PoV we could just extend it to return a map Any suggestions? |
|
(Just to avoid a misunderstanding, I didn’t really read the PR. I have, at this point, ~no opinion on RPC vs. Go API.) |
| return &tarReaderIterator{ | ||
| tr: tr, | ||
| trBuf: trBuf, | ||
| buffer: make([]byte, 1<<20), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like 1<<20 should be in a const somewhere with a rationale?
|
|
||
| // splitFDStreamSocketDescriptor is the fd for the Unix socket used to | ||
| // receive file descriptors via SCM_RIGHTS in the re-exec child. | ||
| const splitFDStreamSocketDescriptor = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding this seems a bit fragile, can't we pass it as an env var or a CLI arg?
| // FDs are streamed to the child process one-at-a-time over a Unix socket | ||
| // using SCM_RIGHTS, avoiding EMFILE from inheriting too many FDs at exec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't have to be truly one at a time, see also https://github.com/cgwalters/jsonrpc-fdpass?tab=readme-ov-file#41-fd-batching
In my initial PoC work here I handled this by just sending the splitfdstream inline data itself as a fd (could be O_TMPFILE or memfd or pipe), and then sending all the associated fds over jsonrpc-fdpass which automatically handles buffering.
Ensure we processing just 1 recvmsg() at a time (only ~200 fds on Linux) seems really reasonable and unlikely to get anywhere close to modern fd limits.
So if we do choose jsonrpc-fdpass, I think we also need to standardize this "splitfdstream serialization".
| return err | ||
| } | ||
|
|
||
| // Try copy_file_range - kernel-level copy, more efficient than userspace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprising we weren't doing this before.
storage/pkg/splitfdstream/fdpass.go
Outdated
|
|
||
| // ReadLine reads bytes until a newline is encountered, using recvmsg. | ||
| // Any FDs received are closed since line-reading doesn't expect them. | ||
| func (p *FDPasser) ReadLine() ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would one want this?
|
|
||
| // JSON-RPC 2.0 protocol constants | ||
| const ( | ||
| JSONRPCVersion = "2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW see also https://github.com/cgwalters/jsonrpc-fdpass-go
Signed-off-by: Giuseppe Scrivano <[email protected]>
26cd4b6 to
23ed544
Compare
Signed-off-by: Giuseppe Scrivano <[email protected]>
Extend the store with splitfdstream capabilities exposed via a UNIX socket for JSON-RPC communication. Signed-off-by: Giuseppe Scrivano <[email protected]>
23ed544 to
8d74f84
Compare
|
still PoC, so you can avoid a review as it is not yet ready but we can use to analyze the complexity. Moved to use https://github.com/cgwalters/jsonrpc-fdpass-go Now the code size is almost 50% than the last revision |
Signed-off-by: Giuseppe Scrivano <[email protected]>
Implement the SplitFDStreamDriver interface for the overlay driver, enabling efficient layer operations with reflink support. Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
8d74f84 to
51a6c01
Compare
|
Packit jobs failed. @containers/packit-build please check. |
Add a new
splitfdstreampackage that enables efficient container layer transfer between storage instances by separating tar metadata from file content. File content is passed as file descriptors.