-
-
Notifications
You must be signed in to change notification settings - Fork 34.3k
Open
Labels
docsDocumentation in the Doc dirDocumentation in the Doc dir
Description
Documentation
The tarfile docs does not make it clear how a programmer can read data from a tarfile into memory without doing a round-trip writing it to the file system. As far as I understand, reading partial data from a tar file essentially amounts to the following steps:
import tarfile
with open("myfile.tar") as f:
tar = tarfile.TarFile(fileobj=f)
tar_info = next(member for member in f.getmembers() if member.is_file())
f.seek(tar_info.offset_data)
data = f.read(tar_info.size)However, to arrive at this, you either need to be confident enough to read the CPython source code, or you need to know that tar-files stores the byte-contents unchanged, and that TarInfo.size is the size of the data without the file header. Neither of these are obvious for less experienced programmers.
I suggest that we make two changes to the tarfile docs:
- Expand the documentation for
TarInfo.sizeso it says more than just "Size in bytes". Size of what exactly? The archived file as far as I can tell. - Include a minimal example (like I have above, but slightly more pedagogical maybe) to the Reading Examples section.
I can propose a PR with these changes if you think that is useful.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
docsDocumentation in the Doc dirDocumentation in the Doc dir
Projects
Status
Todo