Skip to content

Lightning-fast CSV data matching engine. Outperforms Excel and Google Sheets by using vectorized hashing and O(1) set lookups. Features a CustomTkinter GUI, batch processing, and multi-tier fuzzy matching

License

Notifications You must be signed in to change notification settings

RykonZ/Data-Matcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚡ Data Matcher V4.3

License: GPL v3 Python 3.12+

Data Matcher is a high-performance desktop utility designed to audit and match massive CSV datasets. Built for speed and accuracy, it handles complex data cleaning and cross-referencing in seconds—even with millions of rows.

🚀 Performance Benchmarks

Unlike Excel or standard VLOOKUPs, Data Matcher utilizes a vectorized matching engine.

  • Small Sets (1k - 50k rows): Near-instant execution.
  • Large Sets (1M+ rows): Processed in seconds using O(1) set-lookup logic.
  • Memory Efficient: Optimized to handle large files without crashing your system.

✨ Key Features

  • Multi-Tier Matching: * Exact Match: 1:1 identical strings.
    • Trim/Case Neutral: Ignores accidental spaces and capitalization.
    • Symbol Stripping: Matches IDs even if one has dashes, dots, or slashes (e.g., 123-ABC matches 123ABC).
  • Batch Processing: Load multiple source files (File A) to check against one master reference (File B).
  • Audit Trail: Every export includes a Match_Fidelity column and a timestamp for data integrity.
  • Clean UI: Modern Dark Mode interface built with CustomTkinter.

💪 Reliability & Stability

  • Zero Crashes: During extensive stress testing with datasets exceeding 1,000,000 rows, the engine remained stable with 0% crash rates.
  • Pro Tip: For the highest match accuracy, it is recommended to enable both "Auto-Trim" and "Strip Symbols". This ensures that hidden characters, spaces, and formatting differences (like dashes in serial numbers) don't prevent a successful match.

⚠️ Disclaimer

This software is provided "as is", without warranty of any kind, express or implied. In no event shall the author (RykonZ) be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software.

Always back up your data before performing large-scale audits.

☕ Support My Work

If Data Matcher has saved you time or made your work easier, feel free to Buy Me a Coffee!

Please Note: Donations are a wonderful way to show appreciation and support my creative journey as an independent developer. They are gifts of "thanks" and do not represent a contract for faster development, priority feature requests, or dedicated technical support. I build and update this tool out of passion for the community!

🤖 AI Usage Statement

AI-assisted tools were used during development in a limited capacity, primarily for code formatting, comment refinement, and structural readability improvements.

The core algorithms, data-processing logic, and application design were created by me (RykonZ). All AI-assisted output was reviewed, modified where necessary, and integrated manually.

🛠️ Installation & Usage

Option 1: Standalone (.exe)

  1. Go to the Releases page.
  2. Download DataMatcher.zip.
  3. Extract and run DataMatcher.exe.
    • Note: As the software is self-signed by me (RykonZ), Windows may show a "SmartScreen" warning. Click 'More Info' -> 'Run Anyway'.

Option 2: Run from Source

If you have Python installed:

pip install pandas customtkinter
python DataMatcher.py

About

Lightning-fast CSV data matching engine. Outperforms Excel and Google Sheets by using vectorized hashing and O(1) set lookups. Features a CustomTkinter GUI, batch processing, and multi-tier fuzzy matching

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published

Languages