Skip to content

Unclear documentation for TarFile #146396

@yngvem

Description

@yngvem

Documentation

The tarfile docs does not make it clear how a programmer can read data from a tarfile into memory without doing a round-trip writing it to the file system. As far as I understand, reading partial data from a tar file essentially amounts to the following steps:

import tarfile

with open("myfile.tar") as f:
    tar = tarfile.TarFile(fileobj=f)

    tar_info = next(member for member in f.getmembers() if member.is_file())
    f.seek(tar_info.offset_data)
    data = f.read(tar_info.size)

However, to arrive at this, you either need to be confident enough to read the CPython source code, or you need to know that tar-files stores the byte-contents unchanged, and that TarInfo.size is the size of the data without the file header. Neither of these are obvious for less experienced programmers.

I suggest that we make two changes to the tarfile docs:

  1. Expand the documentation for TarInfo.size so it says more than just "Size in bytes". Size of what exactly? The archived file as far as I can tell.
  2. Include a minimal example (like I have above, but slightly more pedagogical maybe) to the Reading Examples section.

I can propose a PR with these changes if you think that is useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsDocumentation in the Doc dir

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions