This Python script is like a vacuum for your files, stealthily removing metadata from a variety of file types. Whether it's images, PDFs, DOCX files, audio files, PowerPoint presentations, OpenDocument files, EPUBs, RTF files, or ZIP archives, this tool has you covered. With a simple GUI built on Tkinter and enhanced with concurrency for better performance, cleaning your files for anonymity is really easy (though processing large batches might take a little time).
Important: This tool is like a magic eraser—it permanently removes metadata and overwrites the original files. To avoid any "Oops!" moments, consider duplicating your files into a temporary folder before unleashing this tool. I am planning to add a mode that doesn't overwrite files in the future, but for now, proceed with caution (also laziness 😉).
Before you start, make sure you have Python 3 installed. This script uses several third-party libraries, so let's get them ready with this spell:
pip install Pillow PyPDF2 python-docx mutagen openpyxl piexif python-pptx odfpyDouble-check your Python version with python --version to ensure compatibility. (Tested on 3.11.8)
Note:
- To handle legacy PowerPoint files (.ppt), you may need additional tools like
unoconvorLibreOfficein headless mode, as they are not directly supported by the current script. - Logging is implemented and outputs to
metadata_removal.login the script's directory for better traceability.
To start your metadata-removal tool, follow these steps:
-
Install Dependencies: Ensure all dependencies are installed as mentioned above.
-
Run the Script: Navigate to the script's directory in your terminal and cast the following spell:
python metadata_removal_tool.py
-
Select Files: A mystical window will appear. Command it by clicking "Select Files" and choosing the files you wish to cleanse.
-
Watch the Magic: Observe as the tool works its magic, notifying you of its victories and defeats with each file processed.
This tool handles:
- Images:
.jpg,.jpeg,.png,.gif,.bmp,.tiff - PDFs:
.pdf - Word Documents:
.docx - PowerPoint Presentations:
.pptx(requirespython-pptx) - OpenDocument Files:
.odt,.ods(requiresodfpy) - Audio Files:
.mp3,.flac - Excel Spreadsheets:
.xlsx - EPUBs:
.epub - RTF Files:
.rtf - ZIP Archives:
.zip
Feel free to request pull requests and leave your issues or improvements. I will happily help!
GNU General Public License - Feel free to review the license terms in the linked file. Thank you for your interest in my project!
-
Expanded File Support:
- PowerPoint Presentations: Added support for
.pptxfiles usingpython-pptx. - OpenDocument Files: Added support for
.odtand.odsfiles usingodfpy. - EPUBs & RTFs: Included support for
.epuband.rtffiles. - ZIP Archives: Enhanced handling of
.zipfiles to remove metadata from contained files.
- PowerPoint Presentations: Added support for
-
Robust JPEG Metadata Removal:
- Two-Step Approach:
- Step 1: Attempts to remove EXIF metadata using
piexif.remove(), preserving image quality by avoiding re-encoding. - Step 2: Verifies if any EXIF data remains. If so, it falls back to re-encoding the image with Pillow to ensure all metadata is stripped, albeit with a slight risk of quality alteration.
- Step 1: Attempts to remove EXIF metadata using
- Two-Step Approach:
-
Concurrency for Enhanced Performance:
- Utilizes
ThreadPoolExecutorfromconcurrent.futuresto process multiple files in parallel, significantly improving performance for large batches.
- Utilizes
-
Comprehensive Logging:
- Implements Python’s
loggingmodule to log informational messages, warnings, and exceptions both to the console and a log file (metadata_removal.log), aiding in easier debugging and maintenance.
- Implements Python’s
-
User-Friendly GUI Enhancements:
- Menu Bar: Includes "Open" and "Exit" options for standard navigation.
- Listbox: Displays all selected files, providing clarity on what’s being processed.
- Progress Bar: Visually represents the processing progress.
- Status Label: Updates users on the current state, such as the number of selected files and processing completion.
-
Error Handling & User Feedback:
- Provides immediate feedback through Tkinter’s message boxes for successes, partial successes, and errors.
- Logs detailed error messages to
metadata_removal.logfor traceability.
-
Cutified Comments:
- Added fun emoticons and clear explanations within the code to make it more engaging while maintaining professionalism.
- Testing: Before deploying, thoroughly test the script with various file types to ensure metadata removal works as expected.
- Backup: Always keep backups of original files before bulk processing, especially since re-encoding (even with minimal quality loss) cannot be undone.
- Extensibility: The script is structured to allow easy addition of more file formats or more sophisticated metadata removal techniques as needed.
Enjoy your fully enhanced, production-ready metadata removal tool! 🎉