Skip to content

YusufAlam1/Invoice2Excel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

12 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“„ Invoice2Excel

๐Ÿ‘‹ About The Project

Invoice2Excel is an ETL-style desktop tool that extracts structured data from PDF invoices and transforms it into clean, validated, and analysis-ready Excel spreadsheets.

Designed with data analysts, accountants, and operations teams in mind, this tool dramatically reduces manual data entry time while ensuring consistency in formatting and output.

Built using Python and Streamlit for a friendly UI, it leverages:

  • pdfplumber, PyMuPDF, and regex for data extraction
  • pandas for transformation
  • OpenPyXL for polished Excel output (including summary sheets, formulas, and optional pivot-style layouts)

โœจ Key Features

๐Ÿ“ฅ Upload PDF Invoices

Drag-and-drop or browse to select PDF invoices for parsing.

๐Ÿ” Smart Parsing Engine

  • Automatically detects invoice structures using regex and layout cues
  • Extracts key fields: invoice number, date, vendor, line items, quantities, prices, and taxes

โš™๏ธ Custom Extraction Profiles

Tailor rules to specific vendors or invoice formats using YAML/JSON config files.

๐Ÿงน Data Cleaning & Validation

  • Automatic type coercion and handling of empty fields
  • Duplicate row detection and cleanup

๐Ÿ“Š Excel Output with Formatting

  • Styled headers, borders, and number formatting using OpenPyXL
  • Optional pivot-style summaries (e.g., totals by month, vendor, or item)

๐Ÿ“ Directory Structure

Invoice2Excel/
โ”‚
โ”œโ”€โ”€ app.py                   # Streamlit UI entry point
โ”œโ”€โ”€ parser/
โ”‚   โ”œโ”€โ”€ extract.py           # PDF loading and text extraction
โ”‚   โ”œโ”€โ”€ parse_invoice.py     # Invoice field parsing logic
โ”‚   โ””โ”€โ”€ config.yaml          # Custom parsing rules per vendor
โ”‚
โ”œโ”€โ”€ writer/
โ”‚   โ”œโ”€โ”€ writer.py            # Excel generation with OpenPyXL
โ”‚   โ””โ”€โ”€ templates/           # Optional formatting templates
โ”‚
โ”œโ”€โ”€ models/
โ”‚   โ””โ”€โ”€ invoice_row.py       # Pydantic model for validation
โ”œโ”€โ”€ tests/                   # Unit tests
โ””โ”€โ”€ uploads/                 # Temporary uploaded PDFs

๐Ÿš€ Getting Started

โœ… Prerequisites

Python 3.10+

pip

Git

โš™๏ธ Installation

git clone https://github.com/YusufAlam1/invoice2excel.git cd invoice2excel

(Optional) Create a virtual environment

python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

โ–ถ๏ธ Run the App

streamlit run _app.py

๐Ÿ”ฎ Future Improvements

  • ๐Ÿง  ML-based layout detection for flexible invoice structures
  • ๐ŸŒ Multi-language support (e.g., French, German)
  • ๐Ÿ’พ Export to CSV, Google Sheets, or JSON
  • ๐Ÿ” Sanitized mode for enterprise users
  • ๐Ÿณ Dockerized deployment
  • ๐Ÿงผ UI for defining config profiles (instead of raw YAML)

About

Automate the boring stuff with Python ๐Ÿ“„โžก๏ธ๐Ÿ“—

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages