Skip to content

Library can't handle large files (2GB+ decompressed parts) #838

@Symbiatch

Description

@Symbiatch

The library can only read files with parts up to 2GB(-1?) decompressed, not any larger. The file on disk may be much smaller. They're quite common in the field I work in. A recent file I had contains over 80000 compressed blocks 26996 bytes each which goes over the limit. The error message is also not clear since it complains about negative capacity which comes from .NET due to lacking checks for overflows.

This issue is at least in these parts:

  • Using MemoryStream which cannot handle larger blocks of memory
  • Returning offsets for handles uses int which overflows and causes negative values (could also cause positive incorrect values)

The reader also allocates a second buffer for the decompressed data (when it decides whether to use CRC or not) which means if you have 1.9G file it will use at least 3.8GB to store it in memory. There should not be any need for this. Even with smaller files it may add up.

I also think there could be performance benefits from doing things a bit differently while reading/decompressing the file instead of using streams which have API expecting multiple bytes to read/write at once.

I have created an internal fix and I could clean it up and make a PR if there is nobody else working on this issue and it's seen as useful. I assume it would be ok to tackle the whole thing in one instead of making smaller changes and waiting for those to trickle through.

This happens with both NuGet packages and the latest master since the code hasn't changed in that part.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions