Invoice2Excel is an ETL-style desktop tool that extracts structured data from PDF invoices and transforms it into clean, validated, and analysis-ready Excel spreadsheets.
Designed with data analysts, accountants, and operations teams in mind, this tool dramatically reduces manual data entry time while ensuring consistency in formatting and output.
Built using Python and Streamlit for a friendly UI, it leverages:
pdfplumber,PyMuPDF, andregexfor data extractionpandasfor transformationOpenPyXLfor polished Excel output (including summary sheets, formulas, and optional pivot-style layouts)
Drag-and-drop or browse to select PDF invoices for parsing.
- Automatically detects invoice structures using regex and layout cues
- Extracts key fields: invoice number, date, vendor, line items, quantities, prices, and taxes
Tailor rules to specific vendors or invoice formats using YAML/JSON config files.
- Automatic type coercion and handling of empty fields
- Duplicate row detection and cleanup
- Styled headers, borders, and number formatting using
OpenPyXL - Optional pivot-style summaries (e.g., totals by month, vendor, or item)
Invoice2Excel/
โ
โโโ app.py # Streamlit UI entry point
โโโ parser/
โ โโโ extract.py # PDF loading and text extraction
โ โโโ parse_invoice.py # Invoice field parsing logic
โ โโโ config.yaml # Custom parsing rules per vendor
โ
โโโ writer/
โ โโโ writer.py # Excel generation with OpenPyXL
โ โโโ templates/ # Optional formatting templates
โ
โโโ models/
โ โโโ invoice_row.py # Pydantic model for validation
โโโ tests/ # Unit tests
โโโ uploads/ # Temporary uploaded PDFs
โ Prerequisites
Python 3.10+
pip
Git
git clone https://github.com/YusufAlam1/invoice2excel.git cd invoice2excel
python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
streamlit run _app.py
- ๐ง ML-based layout detection for flexible invoice structures
- ๐ Multi-language support (e.g., French, German)
- ๐พ Export to CSV, Google Sheets, or JSON
- ๐ Sanitized mode for enterprise users
- ๐ณ Dockerized deployment
- ๐งผ UI for defining config profiles (instead of raw YAML)