An interactive Streamlit app for visually defining and extracting tabular data from geotechnical borelogs or other structured-but-inconsistently-formatted PDFs.
Geotechnical Borelog Digitizer helps convert complex borelog PDFs into clean, structured data — without needing fixed templates or OCR-heavy workflows.
This tool provides a visual interface where you can:
- Define column boundaries by clicking directly on the PDF.
- Adjust, rename, and delete columns dynamically.
- Set header/footer cutoff regions to exclude unwanted areas.
- Extract data from single or multiple pages into Excel format.
It’s particularly useful for SPT logs, field sheets, or borehole records where the text is digital yet not arranged in consistent columns, causing parsing errors in typical PDF tools.
| Feature | Description |
|---|---|
| 🖱️ Interactive Column Creation | Click on the PDF to define column ranges (xmin/xmax). |
| 🧭 Column Management | Rename, delete, or manually edit column boundaries. |
| 📏 Header/Footer Cutoffs | Ignore fixed regions such as titles, legends, or notes. |
| 📄 Page Navigation | Quickly switch between pages with buttons or number input. |
| ⚙️ Data Extraction | Export current or all pages to .xlsx. |
| 🖼️ PDF Visualization | Shows all column lines and cutoff markers directly on the page image. |
| 🔍 Optional Click-to-Coordinates Support | Uses streamlit-image-coordinates for interactive clicks. |
Make sure you have Python 3.9+ installed, then install the required packages:
pip install streamlit streamlit-image-coordinates pdfplumber pandas openpyxl pillow-
Navigate to your project folder Example (Windows):
cd "C:\Users\YourName\Desktop\Borelog_GUI"
-
Run the Streamlit app:
streamlit run pdf_table_extractor_gui_04.py
-
After running the above command, Streamlit will automatically start a local web server. You’ll see something like this in your Command Prompt:
You can now view your Streamlit app in your browser. Local URL: http://localhost:8501 Network URL: http://192.168.x.x:8501 -
Open your browser (it usually opens automatically) and go to: 👉 http://localhost:8501
-
Upload your test PDF (for example,
spt_a1.pdfincluded in this repo) and start using the app.
A sample file spt_a1.pdf is included in this repository.
It demonstrates a typical borelog layout with aligned but inconsistently spaced digital text columns.
- Upload
spt_a1.pdf - Use the click tool to set left/right boundaries of each column.
- Rename columns (e.g., Soil Type, Depth, SPT N, Moisture %).
- Adjust header/footer cutoffs to trim noise.
- Click Extract Current Page or Extract All Pages.
- Download your structured Excel file. ✅
📂 Geotechnical-Borelog-Digitizer
├── pdf_table_extractor_gui_04.py # Main Streamlit app
├── spt_a1.pdf # Example test file
├── assets/
│ └── demo_preview.png
└── README.md
-
The click-to-define feature is optional but highly recommended. Install it via:
pip install streamlit-image-coordinates
-
If your column labels overlap, adjust the label background thickness in the code (search for
# Draw vertical (rotated) label). -
Use higher DPI PDFs for sharper previews.
-
Keep the Command Prompt open while Streamlit is running — closing it will stop the app.
- Add thicker semi-transparent backgrounds behind column labels.
- Optimize performance when defining many columns.
- Add export options for CSV/JSON/AGS4.
- Improve multi-page extraction preview.
This project is released under the MIT License — feel free to use, modify, and distribute it.
Ali Yaz
📅 October 2025
📫 Email: [email protected]
💬 “Turning messy borelogs into structured data — one click at a time.”
