I'm not sure this is something that needs to be addressed in adlfs, but I think it's worth noting that etags returned by fs.info() and fs.ls(detail=True) have different formats. When using fs.info() the etag will be quoted (wrapped in " quotes) but when using fs.ls() the etag will not be quoted. This means that the caller has to correct the quote format before being able to compare etags to see whether a file has been modified.
With adlfs 2024.2.0:
>>> from adlfs import AzureBlobFileSystem
>>> fs = AzureBlobFileSystem()
>>> fs.info("az://test-deletion/2024.2.0/foo", refresh=True)["etag"]
'"0x8DC2DFC9E520378"'
>>> [(f["name"], f["etag"]) for f in fs.ls("az://test-deletion/2024.2.0/", detail=True, refresh=True)]
[('test-deletion/2024.2.0/foo', '0x8DC2DFC9E520378')]
I did some investigation and it looks like this is due to differences in what the azure API returns for different calls. When using BlobClient.get_blob_properties() the returned etag property is wrapped in " quotes. However, when using BlobContainer.walk_blobs() the etag property for iterated blob properties is not wrapped in quotes.
I'm not sure this is something that needs to be addressed in adlfs, but I think it's worth noting that etags returned by
fs.info()andfs.ls(detail=True)have different formats. When usingfs.info()the etag will be quoted (wrapped in"quotes) but when usingfs.ls()the etag will not be quoted. This means that the caller has to correct the quote format before being able to compare etags to see whether a file has been modified.With adlfs 2024.2.0:
I did some investigation and it looks like this is due to differences in what the azure API returns for different calls. When using
BlobClient.get_blob_properties()the returned etag property is wrapped in"quotes. However, when usingBlobContainer.walk_blobs()the etag property for iterated blob properties is not wrapped in quotes.