Skip to content

Conversation

@xermicus
Copy link
Member

This PR changes the blob parser to only fail if the blob is smaller than what the header says.

Required for implementing what Solidity calls "full/perfect verification" (where the compiler needs to append arbitrary data to the blob).

Signed-off-by: xermicus <cyrill@parity.io>
@xermicus xermicus requested a review from koute November 27, 2025 19:00
@koute
Copy link
Collaborator

koute commented Jan 5, 2026

Why not include the metadata in the blob itself? That's why we have sections.

@xermicus
Copy link
Member Author

xermicus commented Jan 5, 2026

I'm not sure how that'd work - since the blob contains the metadata hash it's a chicken-and-egg. There's also the aspect of while we can include arbitrary data elsewhere, this breaks Ethereum style workflows which we don't want.

@koute
Copy link
Collaborator

koute commented Jan 5, 2026

You wouldn't hash the metadata. The blob can already contain stuff which shouldn't probably be hashed because it doesn't affect the execution (e.g. the debug info).

Basically, you'd define a method like this maybe:

fn each_chunk_for_hash(&self, mut callback: impl FnMut(&[u8])) { ... }

This would skip the metadata and optional stuff which doesn't affect execution, and then to calculate a hash you'd do:

let hasher = Hasher::new();
blob.each_chunk_for_hash(|chunk| hasher.update(chunk));
let hash = hasher.finish();

And you'd use this hash to verify that the build is reproducible.

Also, on the plus side, you could use this same hash to build a repository of debug info blobs for programs deployed on-chain. People would strip debug info when uploading on-chain, and upload the full blob somewhere else, and since both hashes would then match it'd be easy to find the matching blob with debug info based on the on-chain blob.

@xermicus
Copy link
Member Author

xermicus commented Jan 5, 2026

The requirement here is that we need to append CBOR data the same way as solc does.

@koute
Copy link
Collaborator

koute commented Jan 5, 2026

In that case I'd prefer if you'd just truncate the blob you pass to from_bytes yourself, which you could do at zero-cost if you'd make ArcBytes::subslice public (or if you can do it before the conversion to ArcBytes then you need no changes at all).

ProgramParts::from_bytes(bytes.subslice(0..ProgramBlob::blob_length(&bytes)))

xermicus and others added 2 commits January 13, 2026 11:11
Signed-off-by: Cyrill Leutwiler <bigcyrill@hotmail.com>
@xermicus
Copy link
Member Author

Thanks for the suggestion.

  • Reverted the original change
  • Made the API public
  • Changed the caller to pass a slice as defined by the blob length

Comment on lines +5233 to +5236
let blob_length = Self::blob_length(&bytes).ok_or(ProgramParseError(ProgramParseErrorKind::Other(
"blob does not contain the blob length",
)))?;
let bytes = bytes.subslice(0..blob_length as usize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I meant doing this outside of PolkaVM code in downstream code; that's why I said to make subslice public, so that you don't need any special support for it here. :P

The .polkavm blob is essentially supposed to be self-contained; if you want to append/prepend data to it then you're welcome to do so, but I don't want to be responsible for handling that data. I'm probably going to e.g. also supply official APIs to write .polkavm blobs in the future (including an API to loselessly convert back and forth), and I just don't want to deal with all of the miscellaneous variations of adding extra metadata that explicitly avoids the intended way of adding such metadata (as a dedicated section inside the file), especially since I have no way of knowing how to handle such grafted metadata correctly.

And the proliferation of different PVM "blob" formats is not a theoretical issue, we already have these different formats:

  • Vanilla .polkavm blob as my tooling builds it
  • JAM service blob (metadata header + minimal PVM header + PVM code sections)
  • CoreVM service blob (metadata header same as JAM service blob + full .polkavm blob)
  • The contracts blob (full .polkavm blob + appended data)

The .polkavm blob was explicitly designed to be extensible so that this isn't necessary! But instead everyone just invents their own flavor, so I'm drawing a line in the sand - I can provide APIs to make the grafting of extra metadata possible, but I don't want to have to deal with it in my core crates.

Copy link
Member Author

@xermicus xermicus Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the point but ignoring extra data wouldn't be part of the format? How I see it, the only thing it does is making the parser a bit more resilient. It also avoids us to maintain a custom version of polkatool.

Isn't it more like an ELF file with trailing data can still be parsed by readelf and objdump and you can still execute it fine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants