Decrypt and extract embedded pdf attachments.
Based on https://piep.tech/posts/automatic-password-removal-in-paperless-ngx/
The first step in creating a pre-consumption script is to create a dictionary file. This file will contain a list of all the passwords that you want to try to remove from the PDF files. To create a dictionary file:
-
Open a text editor.
-
Enter each password on a new line.
-
Save the file as
<paperless-ngx_root>/scripts/passwords.txt.123456 123456789 qwerty password 12345 qwerty123 1q2w3e 12345678
Next, you’ll need to write the pre-consumption script. This script will use the dictionary file to automatically remove the passwords and extract pdf attachments from the PDF files.
- Open a text editor.
- Copy
pre-consumption.pyscript. - Save the file as
<paperless-ngx_root>/scripts/pre-consumption.py.
We need to configure the Python script to run, when a new files is processed by Paperless-ngx.
-
Open your docker configuration file of Paperless-ngx.
<paperless-ngx_root>/docker-compose.yml -
ℹ️ See the example. Make sure that the script folder is available to the docker container.
services.webserver.volumes: - <paperless-ngx_root>/scripts:/usr/src/paperless/scripts -
Make sure that the environment file is processed.
services.env_file: docker-compose.env
-
Open your docker environment file of Paperless-ngx.
<paperless-ngx_root>/docker-compose.env -
ℹ️ See the example. Set the script path.
PAPERLESS_PRE_CONSUME_SCRIPT=/usr/src/paperless/scripts/pre-consumption.py
docker-compose up -dCheck if environment variables were properly set.
docker exec -it paperless_webserver_1 printenv \
| grep PAPERLESS_PRE_CONSUME_SCRIPTShould yield.
PAPERLESS_PRE_CONSUME_SCRIPT=/usr/src/paperless/scripts/pre-consumption.py