-
Notifications
You must be signed in to change notification settings - Fork 6
45-log-compression #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… of bytes and compresses file to gzip beforehand using a temporary file in case it has to handle large files (not in-memory buffer)
…esses files to different formats; use the instance and a compress method inside the upload method
… section of the config.md
wvengen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, you made it with all the different compression methods, way to go 🚀 :)
One thing I am missing, is not to use compression (compression method none, which may be the default, for backward compatibility).
Do we need to update the dependencies for brotli (nice btw to make it optional)?
|
p.s. opened #53 as a follow-up issue. |
|
Just to be clear because in the ticket you wrote that gzip should be default.
would you like me to make default as None? So no compression at all? |
Ah, yes, gzip as default compression method. But I think it would be helpful to keep compression off by default, then we can later transition to gzip if we want to. So yes please, default |
|
Sorry, one more question to clarify, when you say
Do you mean that if the provided by user compression method is not listed as implemented, then instead of raising an exception we would proceed with gzip and write a log that explains why we defaulted to gzip? Is this the behavior you have in mind? |
|
Thanks for asking. I think it makes sense to have default compression method |
… as None; add brotli as a dependency but still treat as optional; make a compression chunk size of 1024 a class constant
wvengen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great, thank you!
Please let me know when you think it's ready to merge :)
Can you please verify that when passing multiple configuration files, it is still possible to reset the compression_method to the empty value? If not, then we may need to introduce none as an explicit value.
|
Perhaps having an explicit compression method of |
|
This is possible, sure, would you like me to add None as an option for the compression_method? The parameter then always should be provided in the config with a desired value. |
|
That would be more explicit. Then we can use |
|
So, yes please! |
…ping compression of the log files; update config to reflect the current implementation
|
For now I haven't provided any default if the compression_method is not provided in the config file, but when we extract the compression method (with |
wvengen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I think we're almost there, just two suggestions that could make the code clearer.
Also, please review the use of single vs. double quotes, I think it's almost consistent.
| if self.compression_method and self.compression_method != 'none' and compressed_file_path != local_path: | ||
| logger.info( | ||
| f"Successfully uploaded compressed file '{object_name}' to container '{self._container_name}'.") | ||
| else: | ||
| logger.info(f"Successfully uploaded file '{object_name}' to container '{self._container_name}'.") | ||
| except (ObjectError, ContainerDoesNotExistError, InvalidContainerNameError) as e: | ||
| logger.exception(f"Error uploading the file '{object_name}': {e}") | ||
| except Exception as e: | ||
| logger.exception(f"An unexpected error occurred while uploading '{object_name}': {e}") | ||
| finally: | ||
| # Remove temporary file even if upload fails | ||
| if compressed_file_path and compressed_file_path != local_path and os.path.exists(compressed_file_path): | ||
| os.remove(compressed_file_path) | ||
| logger.debug(f"Removed temporary compressed file '{compressed_file_path}'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case the compressed_file_path is only set when the file is compressed, and stored compressed with a different extension? In that case, we don't need to test for compressed_file_path != local_path - we can just check if compressed_file_path is set, right? That would make the code more explicit, in my view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this makes sense, thanks for noticing
| COMPRESSION_EXTENSIONS = { | ||
| 'gzip': 'gz', | ||
| 'bzip2': 'bz2', | ||
| 'lzma': 'xz', | ||
| 'brotli': 'br' | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about moving this to the Compressor, e.g. providing a method to return an extension for a compression method, or even giving a compressed filename for an uncompressed filename? Then we only have to adapt one file when we'd add a compression method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good point to move to the compression class but not a good idea to move responsibility of naming the final upload destination to the compression class since it is fully the responsibility of the upload method in the libcloud class. Would be better to stick to direct functions of each class. I can add a method to provide an appropriate file extension or something like this and move this mapping to the compression class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm in my mind, providing the path of the compressed file (based on the uncompressed path) is somewhat related to compression, so it would fit there. E.g. most CLI tools that do compression, rename the file as well.
But if you think it belongs where it is now, fine to leave it as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does already return the compressed file path which is then used by the upload function but we also need an extension type to compose the destination in the remote storage for the upload. That is why it seems good to add a new method to get the appropriate extension instead of, for example, parsing the returned compressed file name or something less transparent and maintainable. I am adding the changes, so you can see the exact thing I am referring here to.
…dd a method to return the extension
wvengen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super, thank you!
#45
This PR adds a new class that is responsible for compression of the log files. It reads a log file from disk in chunks and writes to a temporary compressed file on a disk (an alternative would be to use an in-memory buffer which is not suitable for large log files, so in sake of user experience it uses disk). Upload function then uses the compression method to upload a file to the remote storage. Depending on the preferred compression method, a log file gets an appropriate extension because of the mapping that is added as a class attribute to the LIbcloud class.