An intelligent PDF processing application that uses computer vision and OCR to automatically detect, classify, and extract text from Piping & Instrumentation Diagrams (P&ID). Built with a custom-trained YOLO model for shape detection and multiple OCR engines for text extraction.
- Custom YOLO Model: Trained specifically for P&ID component detection
- Multi-OCR Support: Choose between PaddleOCR, EasyOCR, or custom OCR pipelines
- Interactive GUI: Drag-and-drop interface built with Tkinter
- Batch Processing: Process multiple PDF pages efficiently
- Export Functionality: Export results to Excel (XLSX) format
- Visual Detection: View detected components with bounding boxes and confidence scores
- Robust Error Handling: Multiple fallback strategies for reliable processing
- Python 3.8 or higher
- Windows/Linux/macOS
- Minimum 8GB RAM (16GB recommended for large PDFs)
- CUDA-compatible GPU (optional, for faster processing)
ultralytics>=8.0.0
opencv-python>=4.5.0
pandas>=1.3.0
matplotlib>=3.3.0
pdf2image>=2.1.0
pillow>=8.0.0
numpy>=1.21.0
tkinter (usually included with Python)
tkinterdnd2>=0.3.0
easyocr>=1.6.0
paddlepaddle>=2.4.0
paddlex>=2.1.0
- Poppler: Required for PDF processing
- Windows: Download from poppler-windows
- Ubuntu/Debian:
sudo apt-get install poppler-utils - macOS:
brew install poppler
- Clone the repository
git clone https://github.com/yourusername/pid-pdf-processor.git
cd pid-pdf-processor- Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Download and setup Poppler
- Update the
poppler_pathvariable inPDFProcessor.pyto match your installation
- Place your trained YOLO model
- Put your
best.ptmodel file in themodels/directory - Or update the model path in
get_model()function
python main.py- Load PDF: Drag and drop a PDF file or use the browse button
- Processing: The application will automatically detect P&ID components
- Review Results: View detected shapes and extracted text in the data grid
- Export: Save results to Excel format
from PDFProcessor import get_data_from_pdf_easyocr
# Process PDF with EasyOCR
df = get_data_from_pdf_easyocr(
pdf_path="your_pid_diagram.pdf",
progress_callback=None,
visualize='matplotlib'
)
# Export results
df.to_excel("results.xlsx", index=False)- Collect P&ID Images: Gather diverse P&ID diagrams
- Annotation: Use tools like LabelImg or Roboflow
- Classes: Define your P&ID component classes (e.g., valves, pumps, instruments, pipes)
# Install Ultralytics
pip install ultralytics
# Train the model
yolo train data=pid_dataset.yaml model=yolov8n.pt epochs=100 imgsz=640
# Validate the model
yolo val model=runs/detect/train/weights/best.pt data=pid_dataset.yaml
# Run inference
yolo predict model=runs/detect/train/weights/best.pt source=test_images/dataset/
βββ images/
β βββ train/
β βββ val/
β βββ test/
βββ labels/
β βββ train/
β βββ val/
β βββ test/
βββ pid_dataset.yaml
path: ./dataset
train: images/train
val: images/val
test: images/test
nc: 8 # number of classes
names: ['valve', 'pump', 'instrument', 'pipe', 'tank', 'heat_exchanger', 'compressor', 'control_valve']Choose your preferred OCR engine by calling the appropriate function:
- EasyOCR (Recommended):
get_data_from_pdf_easyocr() - PaddleOCR:
get_data_from_pdf() - Memory-based:
get_data_from_pdf_memory()
# OpenCV visualization
get_data_from_pdf_easyocr(visualize='cv2')
# Matplotlib visualization (default)
get_data_from_pdf_easyocr(visualize='matplotlib')
# No visualization
get_data_from_pdf_easyocr(visualize=None)The application generates a structured DataFrame with the following columns:
| Column | Description |
|---|---|
| Shape | Detected P&ID component type |
| Label | Extracted text from OCR |
| X, Y | Top-left coordinates of bounding box |
| Width, Height | Dimensions of detected component |
| PDF Name | Source PDF filename |
-
Model not found
- Ensure
best.ptis in the correct directory - Check file permissions
- Ensure
-
Poppler not found
- Verify Poppler installation
- Update
poppler_pathin configuration
-
OCR failures
- Try different OCR engines
- Check image quality and resolution
- Ensure sufficient memory is available
-
Memory issues
- Reduce batch size
- Use sequential processing instead of parallel
- Close other applications to free up RAM
- GPU Acceleration: Ensure CUDA is properly installed for faster inference
- Image Preprocessing: Adjust DPI and image enhancement parameters
- Model Optimization: Consider using YOLOv8s or YOLOv8m for better accuracy vs speed trade-offs
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
python -m pytest tests/
# Code formatting
black .
isort .This project is licensed under the MIT License - see the LICENSE file for details.
- Ultralytics YOLOv8 for object detection
- EasyOCR for text recognition
- PaddleOCR for alternative OCR capabilities
- pdf2image for PDF processing
For questions, issues, or feature requests, please:
- Check the Issues page
- Create a new issue with detailed information
- Contact: [email protected]
- Support for multi-page PDF processing
- Advanced P&ID component relationship mapping
- Integration with CAD software APIs
- Real-time processing capabilities
- Web-based interface option
- Docker containerization
- Cloud deployment options
Made with β€οΈ for the Process Engineering Community