Skip to content

nathanvercaemert/IMAGE_PROCESSING

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IMAGE_PROCESSING

Build

Uses cached layers when possible. Fast if only later layers changed.

docker build -t image-processing .

Rebuild (no cache)

Ignores all cached layers and rebuilds everything from scratch. Use this after pushing code changes to GitHub to ensure the cloned repo inside the image is up to date.

docker build --no-cache -t image-processing .

Run

Runs the full pipeline. Expects RAW/ inside the bind-mounted folder. Creates WORKING/ and DATA/ there.

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" image-processing

Preserve drawings

To keep a copy of the bounding-box drawings before they are cropped away, add --preserve-drawings. The BOUND images are copied to the given directory with the prefix renamed from BOUND to DRAW.

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" image-processing /data/RAW /data/WORKING /data/DATA --preserve-drawings /data/DRAWINGS

Single-file reprocessing

Reprocess a single image after manually editing one of its data files. Use --single-file with exactly one of --rotate, --deskew, or --draw to specify which data file was edited. All stages before the edit point are replayed deterministically from existing data. All stages after the edit point are re-detected fresh (except --draw, which uses the edited compound data directly).

--rotate

Use after editing .orientation.txt. Re-detects skew, bounding boxes, and compound data.

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" image-processing /data/RAW /data/WORKING /data/DATA --single-file RAW_0001.tif --rotate

Required data files:

  • DATA/RAW_0001.tif.orientation.txt — orientation (user-edited)

Overwrites: .skew.txt, .boxes.json, .compound.txt

--deskew

Use after editing .skew.txt. Re-detects bounding boxes and compound data.

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" image-processing /data/RAW /data/WORKING /data/DATA --single-file RAW_0001.tif --deskew

Required data files:

  • DATA/RAW_0001.tif.orientation.txt — orientation
  • DATA/ROT_0001.tif.skew.txt — skew angle and confidence (user-edited)

Overwrites: .boxes.json, .compound.txt

--draw

Use after editing .compound.txt. Bounding box data is ignored. The edited compound data defines both the drawn rectangle and the crop region.

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" image-processing /data/RAW /data/WORKING /data/DATA --single-file RAW_0001.tif --draw

Required data files:

  • DATA/RAW_0001.tif.orientation.txt — orientation
  • DATA/ROT_0001.tif.skew.txt — skew angle and confidence
  • DATA/SKEW_0001.tif.compound.txt — crop coordinates (user-edited)

Combining with --preserve-drawings

All three modes can be combined with --preserve-drawings:

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" image-processing /data/RAW /data/WORKING /data/DATA --single-file RAW_0001.tif --deskew --preserve-drawings /data/DRAWINGS

Cleanup

Remove stopped containers and dangling layers. Keeps the built image.

docker rm $(docker ps -a -q --filter ancestor=image-processing) 2>/dev/null; docker image prune -f

Individual Scripts

All scripts live in /opt/image_processing inside the container. Override the entrypoint to run them directly.

Stage 1: ICC Profile Assignment + Conversion

Assign scanner ICC profile to every image in RAW/, convert to ProPhoto working space, write results to WORKING/.

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" --entrypoint python image-processing batch_assign_convert_icc_profile.py /data/RAW /data/WORKING --scanner Scanner.icc --working ProPhoto.icc

Stage 2a: Detect Orientation

Detect 0/180 orientation for each image in WORKING/ using Tesseract OSD. Writes .orientation.txt files to DATA/.

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" --entrypoint python image-processing detect_orientation_with_tesseract_osd.py /data/WORKING /data/DATA

Stage 2b: Fix Upside-Down

Rotate 180-degree images upright using ImageMagick. Renames RAW* to ROT* in WORKING/ (rotating where needed).

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" --entrypoint python image-processing fix_upside_down_with_magick.py /data/WORKING /data/DATA

Stage 3a: Determine Skew Angle

Detect skew angle for each ROT-prefixed image in WORKING/ using Leptonica. Writes .skew.txt files to DATA/.

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" --entrypoint python image-processing determine_skew_angle.py /data/WORKING /data/DATA

Stage 3b: Apply Deskew

Rotate images to correct skew using pyvips. Renames ROT* to SKEW* in WORKING/ (deskewing where needed).

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" --entrypoint python image-processing apply_deskew_with_pyvips.py /data/WORKING /data/DATA

Stage 4a: Detect Bounding Boxes

Detect text bounding boxes for each SKEW-prefixed image in WORKING/ using PaddleOCR. Writes .boxes.json files to DATA/.

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" --entrypoint python image-processing detect_bounding_boxes_with_paddleocr.py /data/WORKING /data/DATA

Stage 4b: Draw Compound Bounding Boxes

Compute and draw a compound bounding rectangle around all detected text regions. Renames SKEW* to BOUND* in WORKING/.

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" --entrypoint python image-processing draw_compound_bounding_boxes.py /data/WORKING /data/DATA

Stage 4c: Crop Compound Bounding Boxes

Crop images to their compound bounding box. Renames BOUND* to CROP* in WORKING/.

docker run -v "/mnt/c/Users/natha/OneDrive/Desktop/TEST_IMAGE_PROCESSING/DOCKER_TEST:/data" --entrypoint python image-processing crop_compound_bounding_boxes.py /data/WORKING /data/DATA

Note: When run via the orchestrator with --preserve-drawings /data/DRAWINGS, BOUND images are copied to the drawings directory (prefix renamed to DRAW) before this crop step executes. The standalone script above does not support this option — it is orchestrator-only.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors