-
Notifications
You must be signed in to change notification settings - Fork 170
[RFC] ostree-ext: import container layers directly from overlay filesystem #1962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[RFC] ostree-ext: import container layers directly from overlay filesystem #1962
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a proof-of-concept for directly importing container layers from an overlay filesystem, aiming to improve performance by leveraging reflinks. This involves adding a new import path in ostree-ext that bypasses tar stream processing, using ostree::Repo::write_dfd_to_mtree and plumbing Storage context through bootc. However, the implementation contains a path traversal vulnerability in the way layer directory paths are constructed from layers.json. The use of open_ambient_dir and std::fs::read_dir in the filesystem import logic is insecure and should be replaced with more robust cap_std operations to prevent potential host filesystem access and TOCTOU issues. Additionally, the new filesystem-based import path does not set an ostree ref for the imported layer commit, which breaks layer caching and negates performance benefits for subsequent operations.
| } | ||
|
|
||
| // Construct the layer diff path: $STORAGE/overlay/$LAYER_ID/diff | ||
| let layer_diff_path = storage_root.join(format!("overlay/{}/diff", layer_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The layer_id obtained from layers.json is used to construct layer_diff_path using storage_root.join(). Since layer_id is not validated, it can contain path traversal sequences like .., allowing an attacker who can influence the contents of layers.json to point layer_diff_path to arbitrary directories on the host. This is a path traversal vulnerability that could lead to unauthorized access to host files during the container import process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're effectively trusting the source storage so this is not a critical problem at all, but containers/composefs-rs#218 comprehensively uses cap_std Dir for this reason.
| async fn import_layer_from_filesystem( | ||
| repo: &ostree::Repo, | ||
| layer_path: &Utf8Path, | ||
| target_ref: &str, | ||
| options: &crate::tar::WriteTarOptions, | ||
| ) -> Result<crate::tar::WriteTarResult> { | ||
| tracing::info!( | ||
| "import_layer_from_filesystem: layer_path={}, target_ref={}", | ||
| layer_path, | ||
| target_ref | ||
| ); | ||
|
|
||
| let config = crate::filesystem::FilesystemFilterConfig::from_tar_options(options); | ||
| let repo = repo.clone(); | ||
| let layer_path = layer_path.to_owned(); | ||
|
|
||
| // Run the synchronous import in a blocking task | ||
| // Note: The caller is responsible for setting the ref within a transaction | ||
| let result = crate::tokio_util::spawn_blocking_flatten(move || { | ||
| crate::filesystem::import_filesystem_to_ostree(&repo, &layer_path, &config) | ||
| }) | ||
| .await?; | ||
|
|
||
| tracing::info!( | ||
| "Successfully imported layer from filesystem: commit={}", | ||
| result.commit | ||
| ); | ||
|
|
||
| Ok(result) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The target_ref parameter is unused in this function. This leads to a bug where layers imported from the filesystem are not cached in ostree via a ref, causing them to be re-imported on every pull.
The else branch for tar stream import correctly uses crate::tar::write_tar which sets the ref. The new filesystem import path should do the same to ensure layer caching works as expected.
To fix this, you should pass target_ref down to import_filesystem_to_ostree and have it set the ref after creating the commit. This will also involve removing the now-incorrect comment about the caller being responsible for setting the ref.
Here are the suggested changes:
1. In crates/ostree-ext/src/container/store.rs:
--- a/crates/ostree-ext/src/container/store.rs
+++ b/crates/ostree-ext/src/container/store.rs
@@ -1297,11 +1297,10 @@
let config = crate::filesystem::FilesystemFilterConfig::from_tar_options(options);
let repo = repo.clone();
let layer_path = layer_path.to_owned();
+ let target_ref = target_ref.to_owned();
// Run the synchronous import in a blocking task
- // Note: The caller is responsible for setting the ref within a transaction
let result = crate::tokio_util::spawn_blocking_flatten(move || {
- crate::filesystem::import_filesystem_to_ostree(&repo, &layer_path, &config)
+ crate::filesystem::import_filesystem_to_ostree(&repo, &layer_path, &target_ref, &config)
})
.await?;
2. In crates/ostree-ext/src/filesystem/filter.rs:
--- a/crates/ostree-ext/src/filesystem/filter.rs
+++ b/crates/ostree-ext/src/filesystem/filter.rs
@@ -72,6 +72,7 @@
pub fn import_filesystem_to_ostree(
repo: &ostree::Repo,
src_path: &Utf8Path,
+ target_ref: &str,
config: &FilesystemFilterConfig,
) -> Result<crate::tar::WriteTarResult> {
let cancellable = gio::Cancellable::NONE;
@@ -188,6 +189,8 @@
.write_commit(None, None, None, Some(&metadata), &root, cancellable)
.context("Writing commit")?;
+ repo.set_ref_immediate(None, target_ref, Some(&commit), cancellable)?;
+
Ok(crate::tar::WriteTarResult {
commit: commit.to_string(),
filtered: filtered_stats,
| let src_dir = cap_std_ext::cap_std::fs::Dir::open_ambient_dir( | ||
| src_path.as_std_path(), | ||
| cap_std_ext::cap_std::ambient_authority(), | ||
| ) | ||
| .with_context(|| format!("Opening source directory: {}", src_path))?; | ||
| let src_dfd = src_dir.as_raw_fd(); | ||
|
|
||
| // Create the root MutableTree | ||
| let root_mtree = ostree::MutableTree::new(); | ||
| let mut filtered_stats = BTreeMap::new(); | ||
|
|
||
| // Create a commit modifier for the import | ||
| // Note: We don't use CONSUME flag since we don't want to modify the source overlay | ||
| let modifier = ostree::RepoCommitModifier::new(ostree::RepoCommitModifierFlags::empty(), None); | ||
|
|
||
| // Set up SELinux labeling from base commit (always enabled when base is provided) | ||
| if let Some(ref base) = config.base { | ||
| modifier.set_sepolicy_from_commit(repo, base, cancellable)?; | ||
| } | ||
|
|
||
| // Create default dirmeta for directories we create | ||
| let dirmeta_checksum = create_default_dirmeta(repo)?; | ||
|
|
||
| // Set the root tree's metadata checksum - required before write_mtree | ||
| root_mtree.set_metadata_checksum(&dirmeta_checksum); | ||
|
|
||
| // Read toplevel entries and process each according to its classification | ||
| let entries = | ||
| std::fs::read_dir(src_path).with_context(|| format!("Reading directory: {}", src_path))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function import_filesystem_to_ostree uses open_ambient_dir to open the source directory and std::fs::read_dir to iterate over its entries. open_ambient_dir follows symlinks and allows opening arbitrary absolute paths. Combined with the potential path traversal in layer_diff_path, this allows importing arbitrary host directories into the OSTree repository. Furthermore, using std::fs::read_dir instead of src_dir.entries() introduces a TOCTOU (Time-of-Check to Time-of-Use) risk and is inconsistent with the use of the cap_std library for secure filesystem access.
b4aa56a to
bacbf43
Compare
|
Awesome, thanks for kickstarting this! So I have a giant amount of code generated in https://github.com/cgwalters/cstor-rs that is a lot more fleshed-out "read (and maybe write to) containers-storage" - but it also needs some cleanup, review, testing etc. Backing up a second...here's what I'm thinking re #20 Basically let's try to go to a world where the flow is:
The thing that is really appealing to me about this is we're centralizing more on composefs. |
44859a5 to
8324e0f
Compare
I have missed this one. Where is the composefs-rs store? |
It's initialized by default only in the composefs path, but...it would make sense probably to do so by default now even with ostree. It's in |
|
I guess I've hit ostreedev/ostree#2712 as it fails to import the whiteout and it seems there is no way to filter it without changing the source directory |
When pulling container images from local containers-storage, detect overlay filesystem layers and import them directly to OSTree instead of processing tar streams. This allows to use reflinks on file systems that support them. Assisted-by: Claude Opus 4.5 Signed-off-by: Giuseppe Scrivano <[email protected]>
8324e0f to
4a5cad5
Compare
|
|
|
||
| let file = match std::fs::File::open(&layers_json_path) { | ||
| Ok(f) => f, | ||
| Err(e) if e.kind() == std::io::ErrorKind::NotFound => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I maintain this cap-std-ext crate which has https://docs.rs/cap-std-ext/latest/cap_std_ext/dirext/trait.CapStdExtDirExt.html#tymethod.open_optional for this reason.
| let matches = layer.diff_digest.as_deref() == Some(digest) | ||
| || layer.compressed_diff_digest.as_deref() == Some(digest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this is interesting, I need to understand the distinction here, will look.
| let root_mtree = ostree::MutableTree::new(); | ||
| let mut filtered_stats = BTreeMap::new(); | ||
|
|
||
| // Create a commit modifier - no SELinux here, it's applied during merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should set the selinux labeling policy here. Which is messy! Because we need to mount (or compute/access) the merged root for just /etc/selinux.
But not doing so causes effective double disk space usage temporarily, it's very ugly.
Again the SELinux labeling thing is one of the most awesome parts about composefs - we just stick all that in the metadata.
| let name_str = name.to_string_lossy(); | ||
| let file_type = entry.file_type().context("Getting file type")?; | ||
|
|
||
| // Only process directories at toplevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know what's funny:
$ podman run --rm -ti quay.io/fedora/fedora cat /.profile
kiwi_align='1048576'
kiwi_boot_timeout=''
kiwi_bootkernel=''
kiwi_bootloader='grub2'
Yes, we build the Fedora base image with this "kiwi" tool that likes to dump its metadata as a file into /...soooooo ugly
| ) | ||
| .with_context(|| format!("Importing file: {}", file_path.display()))?; | ||
| } else if is_overlay_whiteout(&metadata) { | ||
| // Convert overlay whiteout to OCI format (.wh.<filename>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm but we're not handling .wh..wh..opq here?
When pulling container images from local containers-storage, detect overlay filesystem layers and import them directly to OSTree instead of processing tar streams. This allows to use reflinks on file systems that support them.
Assisted-by: Claude Opus 4.5
This is just a PoC at the moment, I've not fully reviewed the generated code yet.
@cgwalters This takes a different approach that we discussed, but I believe it is easier to maintain in the containers library, as it is much easier to add a new API to the image/storage library to expose low level objects (i.e. layer diff when available, file objects by their checksum...) that will replace the manual access to the overlay diff that is implemented now.
Another benefit, IMO, is that it moves us away from tarballs which ultimately I'd like to see it happening in the containers library as well.
I've manually tested the change and that reflinks are created to the files in the containers storage.